[
https://issues.apache.org/jira/browse/LUCENE-3866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13393639#comment-13393639
]
Uwe Schindler edited comment on LUCENE-3866 at 6/17/12 11:40 PM:
-----------------------------------------------------------------
Here the first step of this refactoring (affects only few core classes):
- getSequentialSubReaders() returns a List<? extends IndexReader> (unmodifiable)
- Nuked ReaderUtil.Gather completely (this one is ineffective and slow because
of useless recursion). We can use ReaderContext.leaves() to get all atomic sub
readers including their docStarts
- Improved MultiFields and MultiDocValues by usage of ReaderContext.leaves().
Code much easier to read!
- Moved ReaderUtil.Slice to separate class (it's somehow unrelated), but has
nothing to do with ReaderUtil anymore -> Should move to index package + hidden
There is still one (slow) nocommit, because step 2 will refactor the
ReaderContexts to use Collections, too.
Yonik:
bq. I think we should allow users to access in the most low-level efficient
manner, just as lucene can. Remember, this is expert level stuff! Seems like
at most we would need a javadoc comment saying "don't modify this".
Actually it's more efficient than before. And the collection views are still
backed by arrays and are never used in inner loops (it's just when
IndexSearcher initializes or executes searches on all segments). Without any
real benchmark showing a slowness (please do this *after* step 2 -> nocommit) I
don't think we should risk corrumpt readers.
was (Author: thetaphi):
Here the first step of this refactoring (affects only few core classes):
- getSequentialSubReaders() returns a List<? extends IndexReader> (unmodifiable)
- Nuked ReaderUtil.Gather completely (this one is ineffective and slow because
of useless recursion). We can use ReaderContext.leaves() to get all atomic sub
readers including their docStarts
- Moved ReaderUtil.Slice to separate class (it's somehow unrelated), but has
nothing to do with ReaderUtil anymore -> Should move to index package + hidden
There is still one (slow) nocommit, because step 2 will refactor the
ReaderContexts to use Collections, too.
Yonik:
bq. I think we should allow users to access in the most low-level efficient
manner, just as lucene can. Remember, this is expert level stuff! Seems like
at most we would need a javadoc comment saying "don't modify this".
Actually it's more efficient than before. And the collection views are still
backed by arrays and are never used in inner loops (it's just when
IndexSearcher initializes or executes searches on all segments). Without any
real benchmark showing a slowness (please do this *after* step 2 -> nocommit) I
don't think we should risk corrumpt readers.
> Make CompositeReader.getSequentialSubReaders() and the corresponding
> IndexReaderContext methods return unmodifiable List<R extends IndexReader>
> -----------------------------------------------------------------------------------------------------------------------------------------------
>
> Key: LUCENE-3866
> URL: https://issues.apache.org/jira/browse/LUCENE-3866
> Project: Lucene - Java
> Issue Type: Improvement
> Reporter: Uwe Schindler
> Assignee: Uwe Schindler
> Fix For: 4.0
>
> Attachments: LUCENE-3866-step1.patch
>
>
> Since Lucene 2.9 we have the method getSequentialSubReader() returning
> IndexReader[]. Based on hardly-to-debug errors in user's code, Robert and me
> realized that returning an array from a public API is an anti-pattern. If the
> array is intended to be modifiable (like byte[] in terms,...), it is fine to
> use arrays in public APIs, but not, if the array must be protected from
> modification. As IndexReaders are 100% unmodifiable in trunk code (no
> deletions,...), the only possibility to corrumpt the reader is by modifying
> the array returned by getSequentialSubReaders(). We should prevent this.
> The same theoretically applies to FieldCache, too - but the party that is
> afraid of performance problems is too big to fight against that :(
> For getSequentialSubReaders there is no performance problem at all. The
> binary search of reader-ids inside BaseCompositeReader can still be done with
> the internal protected array, but public APIs should expose only a
> unmodifiable List. The same applies to leaves() and children() in
> IndexReaderContext. This change to list would also allow to make
> CompositeReader and CompositeReaderContext Iterable<R extends IndexReader>,
> so some loops would look nice.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]