[ 
https://issues.apache.org/jira/browse/LUCENE-8041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16508756#comment-16508756
 ] 

David Smiley commented on LUCENE-8041:
--------------------------------------

bq. I don't think we should remove field iteration/Fields unless we remove the 
ability to change term vectors "per-doc". It is currently needed (e.g. by 
CheckIndex) to know what fields were truly indexed for a specific document with 
vectors, since that may disagree with FieldInfos. If we fixed that, then it 
would truly be unnecessary and FieldInfos would be all we need.

That sounds like the cart leading the horse  (allowing how CheckIndex works 
today prevent us from remaking how we want Lucene to be tomorrow).  Can't we 
just relax what CheckIndex checks here -- like have it check but report a 
warning if only some docs have TVs and others not which is generally not 
normal?  I think that's what you're getting at but I'm not sure.  I've only 
looked at CheckIndex in passing.

bq. The only thing blocking this is the fact that term-vector options are 
configurable per-doc, which doesnt make sense anyway.

+1 I agree; if this feature is a casualty of the refactor then I'm fine with it 
going away.  I haven't looked close enough to see how much these things are 
linked (i.e. can we really not have it both ways).

> All Fields.terms(fld) impls should be O(1) not O(log(N))
> --------------------------------------------------------
>
>                 Key: LUCENE-8041
>                 URL: https://issues.apache.org/jira/browse/LUCENE-8041
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: David Smiley
>            Priority: Major
>         Attachments: LUCENE-8041.patch
>
>
> I've seen apps that have a good number of fields -- hundreds.  The O(log(N)) 
> of TreeMap definitely shows up in a profiler; sometimes 20% of search time, 
> if I recall.  There are many Field implementations that are impacted... in 
> part because Fields is the base class of FieldsProducer.  
> As an aside, I hope Fields to go away some day; FieldsProducer should be 
> TermsProducer and not have an iterator of fields. If DocValuesProducer 
> doesn't have this then why should the terms index part of our API have it?  
> If we did this then the issue here would be a simple transition to a HashMap.
> Or maybe we can switch to HashMap and relax the definition of Fields.iterator 
> to not necessarily be sorted?
> Perhaps the fix can be a relatively simple conversion over to LinkedHashMap 
> in many cases if we can assume when we initialize these internal maps that we 
> consume them in sorted order to begin with.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to