[
https://issues.apache.org/jira/browse/LUCENE-7390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15399878#comment-15399878
]
ASF subversion and git services commented on LUCENE-7390:
---------------------------------------------------------
Commit e3de51be2edd086c29be8fdcfca7f8a5990a640c in lucene-solr's branch
refs/heads/branch_6x from Mike McCandless
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=e3de51b ]
LUCENE-7390: another part of the revert
> Let BKDWriter use temp heap for sorting points in proportion to IndexWriter's
> indexing buffer
> ---------------------------------------------------------------------------------------------
>
> Key: LUCENE-7390
> URL: https://issues.apache.org/jira/browse/LUCENE-7390
> Project: Lucene - Core
> Issue Type: Improvement
> Reporter: Michael McCandless
> Fix For: master (7.0), 6.2
>
> Attachments: LUCENE-7390.patch, LUCENE-7390.patch
>
>
> With Lucene's default codec, when writing dimensional points, we only give
> {{BKDWriter}} 16 MB heap to use for sorting, regardless of how large IW's
> indexing buffer is. A custom codec can change this but that's a little steep.
> I've been testing indexing performance on a points-heavy dataset, 1.2 billion
> taxi rides from http://www.nyc.gov/html/tlc/html/about/trip_record_data.shtml
> , indexing with a 1 GB IW buffer, and the small 16 MB heap limit causes clear
> performance problems because flushing the large segments forces {{BKDwriter}}
> to switch to offline sorting which causes the DWPTs take too long to flush.
> They then fall behind, and Lucene does a hard stall on incoming indexing
> threads until they catch up.
> [~rcmuir] had a simple idea to let IW pass the allowed temp heap usage to
> {{PointsWriter.writeField}}.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]