[ 
https://issues.apache.org/jira/browse/LUCENE-7299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15300126#comment-15300126
 ] 

Adrien Grand commented on LUCENE-7299:
--------------------------------------

Thanks Dawid for sharing your implementation and experience! It looks very 
similar to the patch except the redistribution logic and the fact that your imp 
has the ability to parallelize in a ForkJoinPool. I tried to replace the 
redistribution logic out of curiosity but performance was the same.

bq. when you're descending into same prefix blocks you can disregard those 
prefixes in comparisons

The patch already does this, I agree this is an important optimization.

bq. There is also a hook inside byte block list to allow you to retrieve a 
single byte at a given offset so there's no need to copy keys over and over 
again (.byteAt).

I think it should be fine with BytesRefHash since it just returns a BytesRef 
that points to an internal structure rather than copying bytes. Adding a byteAt 
method might help further optimize it but I'd rather not have to add APIs to 
BytesRefHash for now.

> BytesRefHash.sort() should use radix sort?
> ------------------------------------------
>
>                 Key: LUCENE-7299
>                 URL: https://issues.apache.org/jira/browse/LUCENE-7299
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Adrien Grand
>            Assignee: Adrien Grand
>            Priority: Minor
>         Attachments: ByteBlockListSorter.java, LUCENE-7299.patch
>
>
> Switching DocIdSetBuilder to radix sort helped make things significantly 
> faster. We should be able to do the same with BytesRefHash.sort()?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to