[ 
https://issues.apache.org/jira/browse/SOLR-9599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15584223#comment-15584223
 ] 

Yonik Seeley edited comment on SOLR-9599 at 10/18/16 3:18 AM:
--------------------------------------------------------------

Another docvalues faceting test, this time including the current lucene/solr 
code +  lucene70 codec (as of 10/17) 
This test used 10M documents and single valued string fields with 20% of the 
values missing (i.e. 80% of docs have a value for any given field).  4 
concurrent request threads were used with a 4 core CPU.
Note that the 9/19 index has 24 segments and the 10/17 index has 23 segments.

This is a table of new_time/old_time, with old_time being an old docvalues 
index with old code (as of 9/09) before the docvalues iterator cutover:
||field cardinality||9/09 code with 9/09 index||10/17 code with 9/09 index|| 
10/17 code with 10/17 index||
| 10 | 1.00 | 1.39 | 1.41 |
| 100 | 1.00 | 1.38 | 1.46 |
| 1000 | 1.00 | 1.39 | 1.42 |
| 10000 | 1.00 | 1.35 | 1.45 |

So it looks like we're currently over 40% slower in general for faceting on 
single valued docvalue fields that have some values missing.



was (Author: ysee...@gmail.com):
Another docvalues faceting test, this time including the current lucene/solr 
code +  lucene70 codec (as of 10/17) 
This test used 10M documents and single valued string fields with 20% of the 
values missing (i.e. 80% of docs have a value for any given field).
Note that the 9/19 index has 24 segments and the 10/17 index has 23 segments.

This is a table of new_time/old_time, with old_time being an old docvalues 
index with old code (as of 9/09) before the docvalues iterator cutover:
||field cardinality||9/09 code with 9/09 index||10/17 code with 9/09 index|| 
10/17 code with 10/17 index||
| 10 | 1.00 | 1.39 | 1.41 |
| 100 | 1.00 | 1.38 | 1.46 |
| 1000 | 1.00 | 1.39 | 1.42 |
| 10000 | 1.00 | 1.35 | 1.45 |

So it looks like we're currently over 40% slower in general for faceting on 
single valued docvalue fields that have some values missing.


> DocValues performance regression with new iterator API
> ------------------------------------------------------
>
>                 Key: SOLR-9599
>                 URL: https://issues.apache.org/jira/browse/SOLR-9599
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>    Affects Versions: master (7.0)
>            Reporter: Yonik Seeley
>             Fix For: master (7.0)
>
>
> I did a quick performance comparison of faceting indexed fields (i.e. 
> docvalues are not stored) using method=dv before and after the new docvalues 
> iterator went in (LUCENE-7407).
> 5M document index, 21 segments, single valued string fields w/ no missing 
> values.
> || field cardinality || new_time / old_time ||
> |10|2.01|
> |1000|2.02|
> |10000|1.85|
> |100000|1.56|
> |1000000|1.31|
> So unfortunately, often twice as slow.
> See followup messages for tests using real docvalues as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to