[jira] [Comment Edited] (LUCENE-3866) Make CompositeReader.getSequentialSubReaders() and the corresponding IndexReaderContext methods return unmodifiable List

Uwe Schindler (JIRA) Sun, 17 Jun 2012 16:41:44 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-3866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13393639#comment-13393639
 ]


Uwe Schindler edited comment on LUCENE-3866 at 6/17/12 11:40 PM:
-----------------------------------------------------------------

Here the first step of this refactoring (affects only few core classes):

- getSequentialSubReaders() returns a List<? extends IndexReader> (unmodifiable)
- Nuked ReaderUtil.Gather completely (this one is ineffective and slow because 
of useless recursion). We can use ReaderContext.leaves() to get all atomic sub 
readers including their docStarts
- Improved MultiFields and MultiDocValues by usage of ReaderContext.leaves(). 
Code much easier to read!
- Moved ReaderUtil.Slice to separate class (it's somehow unrelated), but has 
nothing to do with ReaderUtil anymore -> Should move to index package + hidden

There is still one (slow) nocommit, because step 2 will refactor the 
ReaderContexts to use Collections, too.

Yonik:

bq. I think we should allow users to access in the most low-level efficient 
manner, just as lucene can. Remember, this is expert level stuff!  Seems like 
at most we would need a javadoc comment saying "don't modify this".

Actually it's more efficient than before. And the collection views are still 
backed by arrays and are never used in inner loops (it's just when 
IndexSearcher initializes or executes searches on all segments). Without any 
real benchmark showing a slowness (please do this *after* step 2 -> nocommit) I 
don't think we should risk corrumpt readers.
                
      was (Author: thetaphi):
    Here the first step of this refactoring (affects only few core classes):

- getSequentialSubReaders() returns a List<? extends IndexReader> (unmodifiable)
- Nuked ReaderUtil.Gather completely (this one is ineffective and slow because 
of useless recursion). We can use ReaderContext.leaves() to get all atomic sub 
readers including their docStarts
- Moved ReaderUtil.Slice to separate class (it's somehow unrelated), but has 
nothing to do with ReaderUtil anymore -> Should move to index package + hidden

There is still one (slow) nocommit, because step 2 will refactor the 
ReaderContexts to use Collections, too.

Yonik:

bq. I think we should allow users to access in the most low-level efficient 
manner, just as lucene can. Remember, this is expert level stuff!  Seems like 
at most we would need a javadoc comment saying "don't modify this".

Actually it's more efficient than before. And the collection views are still 
backed by arrays and are never used in inner loops (it's just when 
IndexSearcher initializes or executes searches on all segments). Without any 
real benchmark showing a slowness (please do this *after* step 2 -> nocommit) I 
don't think we should risk corrumpt readers.
                  
> Make CompositeReader.getSequentialSubReaders() and the corresponding 
> IndexReaderContext methods return unmodifiable List<R extends IndexReader>
> -----------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-3866
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3866
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Uwe Schindler
>            Assignee: Uwe Schindler
>             Fix For: 4.0
>
>         Attachments: LUCENE-3866-step1.patch
>
>
> Since Lucene 2.9 we have the method getSequentialSubReader() returning 
> IndexReader[]. Based on hardly-to-debug errors in user's code, Robert and me 
> realized that returning an array from a public API is an anti-pattern. If the 
> array is intended to be modifiable (like byte[] in terms,...), it is fine to 
> use arrays in public APIs, but not, if the array must be protected from 
> modification. As IndexReaders are 100% unmodifiable in trunk code (no 
> deletions,...), the only possibility to corrumpt the reader is by modifying 
> the array returned by getSequentialSubReaders(). We should prevent this.
> The same theoretically applies to FieldCache, too - but the party that is 
> afraid of performance problems is too big to fight against that :(
> For getSequentialSubReaders there is no performance problem at all. The 
> binary search of reader-ids inside BaseCompositeReader can still be done with 
> the internal protected array, but public APIs should expose only a 
> unmodifiable List. The same applies to leaves() and children() in 
> IndexReaderContext. This change to list would also allow to make 
> CompositeReader and CompositeReaderContext Iterable<R extends IndexReader>, 
> so some loops would look nice.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Comment Edited] (LUCENE-3866) Make CompositeReader.getSequentialSubReaders() and the corresponding IndexReaderContext methods return unmodifiable List

Reply via email to