[
https://issues.apache.org/jira/browse/LUCENE-5611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13983071#comment-13983071
]
Robert Muir commented on LUCENE-5611:
-------------------------------------
In StoredFieldsWriter:
{noformat}
- * <li>For every document, {@link #startDocument(int)} is called,
+ * <li>For every document, {@link #startDocument()} is called,
* informing the Codec how many fields will be written.
{noformat}
This javadoc "compiles" but now does not make sense because we don't pass
numFields as a parameter anymore.
The attribute handling in the indexing chain got more confusing and
complicated. Can we factor this into FieldInvertState?
Its bogus we call hasAttribute + getAttribute, besides making the code more
complicated, its two hashmap lookups for 2 atts. We should add a method to
attribute source that acts like map.get (returns an attribute, or null if it
doesnt exist). Or simple change the semantics of getAttribute to do that. This
can be a followup issue.
I will keep reviewing, i only got thru the first 3 or 4 files in the patch.
> Simplify the default indexing chain
> -----------------------------------
>
> Key: LUCENE-5611
> URL: https://issues.apache.org/jira/browse/LUCENE-5611
> Project: Lucene - Core
> Issue Type: Improvement
> Components: core/index
> Reporter: Michael McCandless
> Assignee: Michael McCandless
> Fix For: 4.9, 5.0
>
> Attachments: LUCENE-5611.patch, LUCENE-5611.patch
>
>
> I think Lucene's current indexing chain has too many classes /
> hierarchy / abstractions, making it look much more complex than it
> really should be, and discouraging users from experimenting/innovating
> with their own indexing chains.
> Also, if it were easier to understand/approach, then new developers
> would more likely try to improve it ... it really should be simpler.
> So I'm exploring a pared back indexing chain, and have a starting patch
> that I think is looking ok: it seems more approachable than the
> current indexing chain, or at least has fewer strange classes.
> I also thought this could give some speedup for tiny documents (a more
> common use of Lucene lately), and it looks like, with the evil
> optimizations, this is a ~25% speedup for Geonames docs. Even without
> those evil optos it's a bit faster.
> This is very much a work in progress / nocommits, and there are some
> behavior changes e.g. the new chain requires all fields to have the
> same TV options (rather than auto-upgrading all fields by the same
> name that the current chain does)...
--
This message was sent by Atlassian JIRA
(v6.2#6252)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]