gf2121 opened a new pull request #541: URL: https://github.com/apache/lucene/pull/541
This approach tried to use a 512 ints ForUtil for BKD ids codec. I benchmarked this optimization by mocking some random LongPoint and querying them with PointInSetQuery. **Benchmark Result** <html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:x="urn:schemas-microsoft-com:office:excel" xmlns="http://www.w3.org/TR/REC-html40"> <head> <meta name=ProgId content=Excel.Sheet> <meta name=Generator content="Microsoft Excel 15"> <link id=Main-File rel=Main-File href="file:////Users/gf/Library/Group%20Containers/UBF8T346G9.Office/TemporaryItems/msohtmlclip/clip.htm"> <link rel=File-List href="file:////Users/gf/Library/Group%20Containers/UBF8T346G9.Office/TemporaryItems/msohtmlclip/clip_filelist.xml"> <!--table {mso-displayed-decimal-separator:"\."; mso-displayed-thousand-separator:"\,";} @page {margin:.75in .7in .75in .7in; mso-header-margin:.3in; mso-footer-margin:.3in;} .font5 {color:windowtext; font-size:9.0pt; font-weight:400; font-style:normal; text-decoration:none; font-family:等线; mso-generic-font-family:auto; mso-font-charset:134;} tr {mso-height-source:auto; mso-ruby-visibility:none;} col {mso-width-source:auto; mso-ruby-visibility:none;} br {mso-data-placement:same-cell;} td {padding-top:1px; padding-right:1px; padding-left:1px; mso-ignore:padding; color:black; font-size:12.0pt; font-weight:400; font-style:normal; text-decoration:none; font-family:等线; mso-generic-font-family:auto; mso-font-charset:134; mso-number-format:General; text-align:general; vertical-align:middle; border:none; mso-background-source:auto; mso-pattern:auto; mso-protection:locked visible; white-space:nowrap; mso-rotate:0;} .xl65 {color:#172B4D; font-size:14.0pt;} .xl66 {color:#172B4D; font-size:14.0pt; mso-number-format:Percent;} .xl67 {font-size:14.0pt;} ruby {ruby-align:left;} rt {color:windowtext; font-size:9.0pt; font-weight:400; font-style:normal; text-decoration:none; font-family:等线; mso-generic-font-family:auto; mso-font-charset:134; mso-char-type:none; display:none;} --> </head> <body link="#0563C1" vlink="#954F72"> <meta charset=utf-8> doc count | field cardinality | query point | baseline QPS | candidate QPS | diff percentage -- | -- | -- | -- | -- | -- 100000000 | 32 | 1 | 51.44 | 148.26 | 188.22% 100000000 | 32 | 2 | 26.8 | 101.88 | 280.15% 100000000 | 32 | 4 | 14.04 | 53.52 | 281.20% 100000000 | 32 | 8 | 7.04 | 28.54 | 305.40% 100000000 | 32 | 16 | 3.54 | 14.61 | 312.71% 100000000 | 128 | 1 | 110.56 | 350.26 | 216.81% 100000000 | 128 | 8 | 16.6 | 89.81 | 441.02% 100000000 | 128 | 16 | 8.45 | 48.07 | 468.88% 100000000 | 128 | 32 | 4.2 | 25.35 | 503.57% 100000000 | 128 | 64 | 2.13 | 13.02 | 511.27% 100000000 | 1024 | 1 | 536.19 | 843.88 | 57.38% 100000000 | 1024 | 8 | 109.71 | 251.89 | 129.60% 100000000 | 1024 | 32 | 33.24 | 104.11 | 213.21% 100000000 | 1024 | 128 | 8.87 | 30.47 | 243.52% 100000000 | 1024 | 512 | 2.24 | 8.3 | 270.54% 100000000 | 8192 | 1 | 3333.33 | 5000 | 50.00% 100000000 | 8192 | 32 | 139.47 | 214.59 | 53.86% 100000000 | 8192 | 128 | 54.59 | 109.23 | 100.09% 100000000 | 8192 | 512 | 15.61 | 36.15 | 131.58% 100000000 | 8192 | 2048 | 4.11 | 11.14 | 171.05% 100000000 | 1048576 | 1 | 2597.4 | 3030.3 | 16.67% 100000000 | 1048576 | 32 | 314.96 | 371.75 | 18.03% 100000000 | 1048576 | 128 | 99.7 | 116.28 | 16.63% 100000000 | 1048576 | 512 | 30.5 | 37.15 | 21.80% 100000000 | 1048576 | 2048 | 10.38 | 12.3 | 18.50% 100000000 | 8388608 | 1 | 2564.1 | 3174.6 | 23.81% 100000000 | 8388608 | 32 | 196.27 | 238.95 | 21.75% 100000000 | 8388608 | 128 | 55.36 | 68.03 | 22.89% 100000000 | 8388608 | 512 | 15.58 | 19.24 | 23.49% 100000000 | 8388608 | 2048 | 4.56 | 5.71 | 25.22% </body> </html> The indices size is reduced for low cardinality fields and flat for high cardinality fields. ``` 113M index_100000000_doc_32_cardinality_baseline 114M index_100000000_doc_32_cardinality_candidate 140M index_100000000_doc_128_cardinality_baseline 133M index_100000000_doc_128_cardinality_candidate 241M index_100000000_doc_8192_cardinality_baseline 233M index_100000000_doc_8192_cardinality_candidate 193M index_100000000_doc_1024_cardinality_baseline 174M index_100000000_doc_1024_cardinality_candidate 314M index_100000000_doc_1048576_cardinality_baseline 315M index_100000000_doc_1048576_cardinality_candidate 392M index_100000000_doc_8388608_cardinality_baseline 391M index_100000000_doc_8388608_cardinality_candidate ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org