[ 
https://issues.apache.org/jira/browse/LUCENE-7500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley updated LUCENE-7500:
---------------------------------
    Attachment: LUCENE_7500_Remove_LeafReader_fields.patch
                LUCENE_7500_avoid_leafReader_fields.patch

I've attached 2 patches to make this easier to review:

The "avoid leafReader fields" one is the most straight-forward -- it finds 
existing callers of LeafReader.fields() that can easily be avoided by going 
directly to terms(field) with a little less code almost every time.  I should 
probably commit this to 6.x to minimize differences between 7x and 6x.

The "remove LeafReader fields" patch is the real substance of this issue. Some 
LeafReaders subclasses barely changed, some got a little simpler depending on 
what they were doing as they didn't need a Field intermediary.  Some other 
details:
* MultiFields uses a new private inner class to expose a Fields view on a 
LeafReader.  I needed to add some similar logic to 
SlowCodecReaderWrapper.readerToFieldsProducer; I'm not sure if it's worth 
consolidating the ~7 lines of code.
* I made more than wrote changes to ParallelReader since it's logic was a bit 
confusing to me and I feel it's now a little more straight-forward.

I should enhance the docs a bit on Fields to clarify it's use.  Such as:
bq. Provides a {@link Terms} index for fields that have it, and lists which 
fields do.  This is primarily an internal/experimental API (see {@link 
FieldsProducer}), although it is also used to expose the set of term vectors 
per document.

IMO, "FieldProducer" should really be named "TermsProducer" and instead of 
subclassing Fields, it can simply have those methods.  But perhaps that can be 
a follow-up (which I don't have time for right now).  Also, "MultiFields" might 
not be such a great name; maybe MultiTerms. And of course the final move, a 
"TermVector" class, then perhaps no need for a "Fields" class; in some cases 
just a Map<String,Terms> will do, like most existing callers of 
MultiFields.getFields for example.

> Nuke Fields.java in lieu of LeafReader.getTerms(fieldName)
> ----------------------------------------------------------
>
>                 Key: LUCENE-7500
>                 URL: https://issues.apache.org/jira/browse/LUCENE-7500
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: David Smiley
>            Assignee: David Smiley
>             Fix For: master (7.0)
>
>         Attachments: LUCENE_7500_avoid_leafReader_fields.patch, 
> LUCENE_7500_Remove_LeafReader_fields.patch
>
>
> {{Fields}} seems like a pointless intermediary between the {{LeafReader}} and 
> {{Terms}}. Why not have {{LeafReader.getTerms(fieldName)}} instead? One loses 
> the ability to get the count and iterate over indexed fields, but it's not 
> clear what real use-cases are for that and such rare needs could figure that 
> out with FieldInfos.
> [~mikemccand] pointed out that we'd probably need to re-introduce a 
> {{TermVectors}} class since TV's are row-oriented not column-oriented.  IMO 
> they should be column-oriented but that'd be a separate issue.
> _(p.s. I'm lacking time to do this w/i the next couple months so if someone 
> else wants to tackle it then great)_



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to