[ https://issues.apache.org/jira/browse/LUCENE-8041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16240977#comment-16240977 ]
Robert Muir commented on LUCENE-8041: ------------------------------------- It doesn't need to be all *all* fields.terms impls. It is enough to optimize the default codec. TreeMap is a good simple default, all the various alternative terms dicts can continue to use it. But the default codec should optimize for the access behavior that matters: accessing a field randomly. I don't think we should remove field iteration/Fields unless we remove the ability to change term vectors "per-doc". It is currently needed (e.g. by CheckIndex) to know what fields were truly indexed for a specific document with vectors, since that may disagree with FieldInfos. If we fixed that, then it would truly be unnecessary and FieldInfos would be all we need. > All Fields.terms(fld) impls should be O(N) not O(log(N)) > -------------------------------------------------------- > > Key: LUCENE-8041 > URL: https://issues.apache.org/jira/browse/LUCENE-8041 > Project: Lucene - Core > Issue Type: Improvement > Reporter: David Smiley > > I've seen apps that have a good number of fields -- hundreds. The O(log(N)) > of TreeMap definitely shows up in a profiler; sometimes 20% of search time, > if I recall. There are many Field implementations that are impacted... in > part because Fields is the base class of FieldsProducer. > As an aside, I hope Fields to go away some day; FieldsProducer should be > TermsProducer and not have an iterator of fields. If DocValuesProducer > doesn't have this then why should the terms index part of our API have it? > If we did this then the issue here would be a simple transition to a HashMap. > Or maybe we can switch to HashMap and relax the definition of Fields.iterator > to not necessarily be sorted? > Perhaps the fix can be a relatively simple conversion over to LinkedHashMap > in many cases if we can assume when we initialize these internal maps that we > consume them in sorted order to begin with. -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org