gf2121 opened a new pull request #541:
URL: https://github.com/apache/lucene/pull/541


   This approach tried to use a 512 ints ForUtil for BKD ids codec. I 
benchmarked this optimization by mocking some random LongPoint and querying 
them with PointInSetQuery.
   
   **Benchmark Result**
   <html xmlns:v="urn:schemas-microsoft-com:vml"
   xmlns:o="urn:schemas-microsoft-com:office:office"
   xmlns:x="urn:schemas-microsoft-com:office:excel"
   xmlns="http://www.w3.org/TR/REC-html40";>
   
   <head>
   
   <meta name=ProgId content=Excel.Sheet>
   <meta name=Generator content="Microsoft Excel 15">
   <link id=Main-File rel=Main-File
   
href="file:////Users/gf/Library/Group%20Containers/UBF8T346G9.Office/TemporaryItems/msohtmlclip/clip.htm">
   <link rel=File-List
   
href="file:////Users/gf/Library/Group%20Containers/UBF8T346G9.Office/TemporaryItems/msohtmlclip/clip_filelist.xml">
   
   <!--table
        {mso-displayed-decimal-separator:"\.";
        mso-displayed-thousand-separator:"\,";}
   @page
        {margin:.75in .7in .75in .7in;
        mso-header-margin:.3in;
        mso-footer-margin:.3in;}
   .font5
        {color:windowtext;
        font-size:9.0pt;
        font-weight:400;
        font-style:normal;
        text-decoration:none;
        font-family:等线;
        mso-generic-font-family:auto;
        mso-font-charset:134;}
   tr
        {mso-height-source:auto;
        mso-ruby-visibility:none;}
   col
        {mso-width-source:auto;
        mso-ruby-visibility:none;}
   br
        {mso-data-placement:same-cell;}
   td
        {padding-top:1px;
        padding-right:1px;
        padding-left:1px;
        mso-ignore:padding;
        color:black;
        font-size:12.0pt;
        font-weight:400;
        font-style:normal;
        text-decoration:none;
        font-family:等线;
        mso-generic-font-family:auto;
        mso-font-charset:134;
        mso-number-format:General;
        text-align:general;
        vertical-align:middle;
        border:none;
        mso-background-source:auto;
        mso-pattern:auto;
        mso-protection:locked visible;
        white-space:nowrap;
        mso-rotate:0;}
   .xl65
        {color:#172B4D;
        font-size:14.0pt;}
   .xl66
        {color:#172B4D;
        font-size:14.0pt;
        mso-number-format:Percent;}
   .xl67
        {font-size:14.0pt;}
   ruby
        {ruby-align:left;}
   rt
        {color:windowtext;
        font-size:9.0pt;
        font-weight:400;
        font-style:normal;
        text-decoration:none;
        font-family:等线;
        mso-generic-font-family:auto;
        mso-font-charset:134;
        mso-char-type:none;
        display:none;}
   -->
   
   </head>
   
   <body link="#0563C1" vlink="#954F72">
   <meta charset=utf-8>
   
   
   
   doc count | field   cardinality | query   point | baseline   QPS | candidate 
  QPS | diff   percentage
   -- | -- | -- | -- | -- | --
   100000000 | 32 | 1 | 51.44 | 148.26 | 188.22%
   100000000 | 32 | 2 | 26.8 | 101.88 | 280.15%
   100000000 | 32 | 4 | 14.04 | 53.52 | 281.20%
   100000000 | 32 | 8 | 7.04 | 28.54 | 305.40%
   100000000 | 32 | 16 | 3.54 | 14.61 | 312.71%
   100000000 | 128 | 1 | 110.56 | 350.26 | 216.81%
   100000000 | 128 | 8 | 16.6 | 89.81 | 441.02%
   100000000 | 128 | 16 | 8.45 | 48.07 | 468.88%
   100000000 | 128 | 32 | 4.2 | 25.35 | 503.57%
   100000000 | 128 | 64 | 2.13 | 13.02 | 511.27%
   100000000 | 1024 | 1 | 536.19 | 843.88 | 57.38%
   100000000 | 1024 | 8 | 109.71 | 251.89 | 129.60%
   100000000 | 1024 | 32 | 33.24 | 104.11 | 213.21%
   100000000 | 1024 | 128 | 8.87 | 30.47 | 243.52%
   100000000 | 1024 | 512 | 2.24 | 8.3 | 270.54%
   100000000 | 8192 | 1 | 3333.33 | 5000 | 50.00%
   100000000 | 8192 | 32 | 139.47 | 214.59 | 53.86%
   100000000 | 8192 | 128 | 54.59 | 109.23 | 100.09%
   100000000 | 8192 | 512 | 15.61 | 36.15 | 131.58%
   100000000 | 8192 | 2048 | 4.11 | 11.14 | 171.05%
   100000000 | 1048576 | 1 | 2597.4 | 3030.3 | 16.67%
   100000000 | 1048576 | 32 | 314.96 | 371.75 | 18.03%
   100000000 | 1048576 | 128 | 99.7 | 116.28 | 16.63%
   100000000 | 1048576 | 512 | 30.5 | 37.15 | 21.80%
   100000000 | 1048576 | 2048 | 10.38 | 12.3 | 18.50%
   100000000 | 8388608 | 1 | 2564.1 | 3174.6 | 23.81%
   100000000 | 8388608 | 32 | 196.27 | 238.95 | 21.75%
   100000000 | 8388608 | 128 | 55.36 | 68.03 | 22.89%
   100000000 | 8388608 | 512 | 15.58 | 19.24 | 23.49%
   100000000 | 8388608 | 2048 | 4.56 | 5.71 | 25.22%
   
   
   
   </body>
   
   </html>
   
   
   The indices size is reduced for low cardinality fields and flat for high 
cardinality fields.
   
   ```
   113M    index_100000000_doc_32_cardinality_baseline
   114M    index_100000000_doc_32_cardinality_candidate
   
   140M    index_100000000_doc_128_cardinality_baseline
   133M    index_100000000_doc_128_cardinality_candidate
   
   241M    index_100000000_doc_8192_cardinality_baseline
   233M    index_100000000_doc_8192_cardinality_candidate
   
   193M    index_100000000_doc_1024_cardinality_baseline
   174M    index_100000000_doc_1024_cardinality_candidate
   
   314M    index_100000000_doc_1048576_cardinality_baseline
   315M    index_100000000_doc_1048576_cardinality_candidate
   
   392M    index_100000000_doc_8388608_cardinality_baseline
   391M    index_100000000_doc_8388608_cardinality_candidate
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to