[ https://issues.apache.org/jira/browse/LUCENE-7966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16167969#comment-16167969 ]
Adrien Grand commented on LUCENE-7966: -------------------------------------- I did some tests with the Calgary corpus that can be found at http://corpus.canterbury.ac.nz/descriptions/ (lower is better): || File || Time to compress without patch || Time to compress with the patch ||Difference || | bib | 971702 | 904173 | -6.9% | | book1 | 7479794 | 7073712 | -5.4% | | book2 | 4990347 | 4574486 | -8.3% | | geo | 1600972 | 1574435 | -1.7% | | news | 3394833 | 3222113 | -5.1% | | obj1 | 169516 | 166673 | -1.7% | | obj2 | 1869442 | 1769302 | -5.4% | | paper1 | 385900 | 357472 | -7.4% | | pic | 1528354 | 1314336 | -14% | | progc | 279295 | 261445 | -6.4% | | progl | 410565 |376898 | -8.2% | | progp | 245654 | 222230 | -9.5% | | trans | 517571 | 470134 | -9.2% | As expected the improvement is better on files that have long repetitions like source code and the bitmap picture. The speedup is constantly reproducible. > build mr-jar and use some java 9 methods if available > ----------------------------------------------------- > > Key: LUCENE-7966 > URL: https://issues.apache.org/jira/browse/LUCENE-7966 > Project: Lucene - Core > Issue Type: Improvement > Components: general/build > Reporter: Robert Muir > Attachments: LUCENE-7966.patch, LUCENE-7966.patch, LUCENE-7966.patch, > LUCENE-7966.patch, LUCENE-7966.patch > > > See background: http://openjdk.java.net/jeps/238 > It would be nice to use some of the newer array methods and range checking > methods in java 9 for example, without waiting for lucene 10 or something. If > we build an MR-jar, we can start migrating our code to use java 9 methods > right now, it will use optimized methods from java 9 when thats available, > otherwise fall back to java 8 code. > This patch adds: > {code} > Objects.checkIndex(int,int) > Objects.checkFromToIndex(int,int,int) > Objects.checkFromIndexSize(int,int,int) > Arrays.mismatch(byte[],int,int,byte[],int,int) > Arrays.compareUnsigned(byte[],int,int,byte[],int,int) > Arrays.equal(byte[],int,int,byte[],int,int) > // did not add char/int/long/short/etc but of course its possible if needed > {code} > It sets these up in {{org.apache.lucene.future}} as 1-1 mappings to java > methods. This way, we can simply directly replace call sites with java 9 > methods when java 9 is a minimum. Simple 1-1 mappings mean also that we only > have to worry about testing that our java 8 fallback methods work. > I found that many of the current byte array methods today are willy-nilly and > very lenient for example, passing invalid offsets at times and relying on > compare methods not throwing exceptions, etc. I fixed all the instances in > core/codecs but have not looked at the problems with AnalyzingSuggester. Also > SimpleText still uses a silly method in ArrayUtil in similar crazy way, have > not removed that one yet. -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org