[ 
https://issues.apache.org/jira/browse/SOLR-1782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13565220#comment-13565220
 ] 

David Christianson commented on SOLR-1782:
------------------------------------------

Thanks for looking at this! I'll look at the tests in the unified patch.

Two questions, then my understanding of what's going on in the patch.
  
 1. Pure ignorance of the codebase prevents me from understanding perf issues 
with these objects - just that UnInvertedField got used for statistics 
collection with multivalued facets - how is it possible to get the term values 
without the DocTermsIndex?
 2. Should DocTermsIndex be used at all?

My understanding:

I can see there is definitely a code dup issue (or at least a lot of branching 
logic) not only in the patch but in the current code.

 1. the original StatsComponent used a DocTermsIndex to look up facet values to 
be accumulated using DTI.getOrd().
 2. In the patch UnInvertedField is used in FieldFacetStats to deal with 
multivalued facet fields, which mainly included modifying UnInvertedField to 
provide the getTermNumbers method that listed term ordinals. As an aside, I 
don't understand whether this is reasonable to do or not, why there wouldn't 
already be a way to get at all the values of a field.
 3. Within FieldFacetStats the if (null == uif) is added. The code on both 
branches is more or less duplicate, only the multivalued branch (null != uif) 
iterates over multiple ordinals using the UnInverted Field whereas the single 
valued branch only reads one ordingal. But both branches use DocTermsIndex. 
 4. UIF.getStats is used in getStatsInfo in the current code base. It looks to 
me it is specifically used when the value being computed is multivalued and in 
the patch gets modified to handle the case of a multivalued facet field 
accumulating stats on a multivalued value field.
 5. There is another facet accumulation method in FieldFacetStats used in the 
current code base when accumulating stats over multivalued value fields. This 
winds up getting the if (null == uif) treatment as well. So code duplication is 
multiplied
 6. Even though there is this code dup in the current code base wrt 
UIF.getStats it still seems like FieldFacetStats is the core - maybe there's a 
way to fix it all?


                
> stats.facet assumes FieldCache.StringIndex - fails horribly on multivalued 
> fields
> ---------------------------------------------------------------------------------
>
>                 Key: SOLR-1782
>                 URL: https://issues.apache.org/jira/browse/SOLR-1782
>             Project: Solr
>          Issue Type: Bug
>          Components: search
>    Affects Versions: 1.4
>         Environment: reproduced on Win2k3 using 1.5.0-dev solr ($Id: 
> CHANGES.txt 906924 2010-02-05 12:43:11Z noble $)
>            Reporter: Gerald DeConto
>            Assignee: Hoss Man
>         Attachments: index.rar, SOLR-1782.2013-01-07.patch, 
> SOLR-1782.2.patch, SOLR-1782.patch, SOLR-1782.patch, SOLR-1782.patch, 
> SOLR-1782.test.patch
>
>
> the StatsComponent assumes any field specified in the stats.facet param can 
> be faceted using FieldCache.DEFAULT.getStringIndex.  This can cause problems 
> with a variety of field types, but in the case of multivalued fields it can 
> either cause erroneous false stats when the number of distinct values is 
> small, or it can cause ArrayIndexOutOfBoundsException when the number of 
> distinct values is greater then the number of documents.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to