[jira] [Commented] (LUCENE-5098) Broadword bit selection

Paul Elschot (JIRA) Thu, 11 Jul 2013 12:59:50 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-5098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13706160#comment-13706160
 ]


Paul Elschot commented on LUCENE-5098:
--------------------------------------

bq. Maybe these utility methods could be moved to oal.util.BitUtil where we 
already have some utility methods to count or find bits?

I kept them separate to have a clear reference to the Vigna broadword article.

The class javadoc already refers to all methods and constants from there, so 
that could be easily merged into BitUtil's class javadoc when merging the two 
classes.
Also both classes have only static attributes.

So merging is no problem at all, but I'm still in two minds about this.
Any opinions either way, i.e. merging class BroadWord here into the existing 
BitUtil or not?

bq. I ran quick benchmarks for select and select9 seemed to quickly outperform 
selectNaive, even for non-dense longs. Do you have the same experience with 
selectNaive vs. select9?  If yes, maybe selectNaive could be moved to the test 
case.

In the Quasi Succinct Indices article (EliasFanoDocIdSet, LUCENE-5084), in 
Section 9 under Reading Unary Codes, Vigna mentions that within the right 
longword:
bq. Our experiments show that broadword bit search is extremely effective, 
unless the number of reads is very small, as in that case computing iteratively 
the least significant bit becomes competitive. Indeed, when skipping a very 
small number of position (e.g., less then eight) we simply resort to iterating 
through the list.

In the test code there is a comment that select9 is about 7 times faster than 
selectNaive, 45ns against 295 ns for a very dense case on my old 32 bit home 
machine. Since the machine is 32 bits, that is no more than a rough indication 
that the selectNaive here (that iterates) might indeed preferable over the 
select9 here when selecting before the 8th set bit.
A safe conclusion is that moving selectNaive to the test cases now would be 
premature.

bq. Since the main goal of rank9 is code documentation, maybe it should be made 
package-private?
I have not actually benchmarked rank9 it against Long.bitCount, but I think we 
should do that just to be sure that rank9 is slower, and than it can be made 
package-private.

bq. As for longHex – I always liked the "assembly" version better.
I have used that mostly during debugging, and it ended up also in toString in 
EliasFanoEncoder iirc.
How about putting the assembly version in BitUtil?

bq. Could we extend LuceneTestCase for tests, please?
I don't really like it, but my next patch here will have that.
Should LuceneTestCase also be mentioned in the wiki at "How to contribute"?





                
> Broadword bit selection
> -----------------------
>
>                 Key: LUCENE-5098
>                 URL: https://issues.apache.org/jira/browse/LUCENE-5098
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: core/other
>            Reporter: Paul Elschot
>            Assignee: Adrien Grand
>            Priority: Minor
>         Attachments: LUCENE-5098.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5098) Broadword bit selection

Reply via email to