[ 
https://issues.apache.org/jira/browse/LUCENE-7396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adrien Grand updated LUCENE-7396:
---------------------------------
    Attachment: LUCENE-7396.patch

This is a great summary of the change. :)

bq. more cleanly than LUCENE-7390 (should we revert that?)

I agree we should only have this or LUCENE-7390, not both.

bq. Maybe MutablePointsReader should expose public byte getByteAt(int i, int 
bytePos);? This would save copying a whole value just because you want to see a 
specific byte.

I initally wanted to limit the number of methods to a minimum but since the use 
of this API is very contained it is probably ok. It also seems to help with 
flush performance, I am now seeing flush taking ~20 secs instead of 23 
previously with IndexTaxis.

bq. I hope our points test cases are stressful enough in testing this tie break 
case!

Indeed it was not, changing this branch to return a constant does not break the 
tests. I extracted the sorting/partitioning logic to a helper class with 
dedicated tests to test this better (it would require too many docs to be 
indexed otherwise).

bq. It looks like we no longer call this from assert in the merge 1D case, 
except within one leaf block? Was that intentional?

Definitely not. I added it back.

> Speed up flush of 1-dimension points
> ------------------------------------
>
>                 Key: LUCENE-7396
>                 URL: https://issues.apache.org/jira/browse/LUCENE-7396
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Adrien Grand
>            Priority: Minor
>         Attachments: LUCENE-7396.patch, LUCENE-7396.patch, LUCENE-7396.patch, 
> LUCENE-7396.patch
>
>
> 1D points already have an optimized merge implementation which works when 
> points come in order. So maybe we could make IndexWriter's PointValuesWriter 
> sort before feeding the PointsFormat and somehow propagate the information to 
> the PointsFormat?
> The benefit is that flushing could directly stream points to disk with little 
> memory usage.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to