[jira] [Commented] (LUCENE-8041) All Fields.terms(fld) impls should be O(N) not O(log(N))

Robert Muir (JIRA) Mon, 06 Nov 2017 14:00:21 -0800

    [ 
https://issues.apache.org/jira/browse/LUCENE-8041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16240977#comment-16240977
 ]


Robert Muir commented on LUCENE-8041:
-------------------------------------

It doesn't need to be all *all* fields.terms impls. It is enough to optimize 
the default codec. 

TreeMap is a good simple default, all the various alternative terms dicts can 
continue to use it.
But the default codec should optimize for the access behavior that matters: 
accessing a field randomly.

I don't think we should remove field iteration/Fields unless we remove the 
ability to change term vectors "per-doc". It is currently needed (e.g. by 
CheckIndex) to know what fields were truly indexed for a specific document with 
vectors, since that may disagree with FieldInfos. If we fixed that, then it 
would truly be unnecessary and FieldInfos would be all we need.


> All Fields.terms(fld) impls should be O(N) not O(log(N))
> --------------------------------------------------------
>
>                 Key: LUCENE-8041
>                 URL: https://issues.apache.org/jira/browse/LUCENE-8041
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: David Smiley
>
> I've seen apps that have a good number of fields -- hundreds.  The O(log(N)) 
> of TreeMap definitely shows up in a profiler; sometimes 20% of search time, 
> if I recall.  There are many Field implementations that are impacted... in 
> part because Fields is the base class of FieldsProducer.  
> As an aside, I hope Fields to go away some day; FieldsProducer should be 
> TermsProducer and not have an iterator of fields. If DocValuesProducer 
> doesn't have this then why should the terms index part of our API have it?  
> If we did this then the issue here would be a simple transition to a HashMap.
> Or maybe we can switch to HashMap and relax the definition of Fields.iterator 
> to not necessarily be sorted?
> Perhaps the fix can be a relatively simple conversion over to LinkedHashMap 
> in many cases if we can assume when we initialize these internal maps that we 
> consume them in sorted order to begin with.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-8041) All Fields.terms(fld) impls should be O(N) not O(log(N))

Reply via email to