[jira] [Commented] (LUCENE-5611) Simplify the default indexing chain

Robert Muir (JIRA) Wed, 16 Apr 2014 17:51:25 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-5611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13972152#comment-13972152
 ]


Robert Muir commented on LUCENE-5611:
-------------------------------------

Overall I do like the simplification of the abstractions: some comments, a lot 
of which probably dont need to be dealt with on this issue, but stuff to think 
about.

I think the specializations in the default chain just work around lack of field 
reuse? Maybe we should rethink this for Lucene 5, some way that makes it easier 
and more intuitive so that this reuse isn't necessary for good performance.

As far as the LuceneTestCase nocommit, we have some similar situations 
elsewhere, like RandomPF/RandomCodec where we "remember" for a field for that 
test class and are consistent. I think thats enough for good coverage? If we 
want to mix things up, a test can do that manually.

I keep going back and forth on the StoredFieldsWriter codec api change: I can 
live with it (assuming javadocs are fixed, heh), and I think its ok for a step 
(to prevent bogus passes on the fields), but it reminds me of the old postings 
API... perhaps a pull model is warranted, where the writer actually just uses 
the visitor API or something simple like that. It might actually make it 
cleaner, for example uncompressed stored fields wouldn't need to buffer up in a 
RAMOutputStream, it could just do the bogus pass IW was doing before.

As far as the vectors change, I think its an ok tradeoff. If there are concerns 
maybe o.a.l.document could help. But i dont think it makes sense to use 
conflicting vectors values for the same field name... in the same doc.

Are the new checks in field mandatory? What happens if a custom IndexableField 
does this (tries to index vectors when not indexed)?


> Simplify the default indexing chain
> -----------------------------------
>
>                 Key: LUCENE-5611
>                 URL: https://issues.apache.org/jira/browse/LUCENE-5611
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: core/index
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>             Fix For: 4.9, 5.0
>
>         Attachments: LUCENE-5611.patch
>
>
> I think Lucene's current indexing chain has too many classes /
> hierarchy / abstractions, making it look much more complex than it
> really should be, and discouraging users from experimenting/innovating
> with their own indexing chains.
> Also, if it were easier to understand/approach, then new developers
> would more likely try to improve it ... it really should be simpler.
> So I'm exploring a pared back indexing chain, and have a starting patch
> that I think is looking ok: it seems more approachable than the
> current indexing chain, or at least has fewer strange classes.
> I also thought this could give some speedup for tiny documents (a more
> common use of Lucene lately), and it looks like, with the evil
> optimizations, this is a ~25% speedup for Geonames docs.  Even without
> those evil optos it's a bit faster.
> This is very much a work in progress / nocommits, and there are some
> behavior changes e.g. the new chain requires all fields to have the
> same TV options (rather than auto-upgrading all fields by the same
> name that the current chain does)...



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (LUCENE-5611) Simplify the default indexing chain

Reply via email to