+1, A lot of this was discussed on SOLR-12259, we should probably link
any Lucene JIRAs for this back to that one to make an easy trail to
follow.

One thing I'd thought of is whether we should merge segments during
this operation. If we're going to rewrite the entire index anyway,
does it make sense to combine segments into max-sized segments a-la
TieredMergePolicy?

I'm not thinking of anything fancy at all here, there's no "cost" to
calculate for instance. Just
1> go through the list of segments adding to a OneMerge until it's as
big as it can be.
2> repeat until you have a list of OneMerge's that contain all the
original segments.

How big "as big as it can be" is TBD, TMP uses 5G. Could be a param I
suppose.....

Erick


On Wed, Jan 23, 2019 at 9:24 AM Andrzej Białecki <a...@getopt.org> wrote:
>
> +1. I think that even with these caveats (read-only, some data may require 
> re-interpretation) it would still be a great help for accessing legacy data, 
> for which the original source may no longer exist.
>
> > On 23 Jan 2019, at 15:11, Simon Willnauer <simon.willna...@gmail.com> wrote:
> >
> > Hey folks,
> >
> > tl;dr; I want to be able to open an indexreader on an old index if the
> > SegmentInfo version is supported and all segment codecs are available.
> > Today that's not possible even if I port old formats to current
> > versions.
> >
> > Our BWC policy for quite a while has been N-1 major versions. That's
> > good and I think we should keep it that way. Only recently, caused by
> > changes how we encode/decode norms we also hard-enforce a the
> > index-version-created in several places and the version a segment was
> > written with. These are great enforcements and I understand why. My
> > request here is if we can find consensus on allowing somehow (a
> > special DirectoryReader for instance) to open such an index for
> > reading only that doesn't provide the guarantees that our high level
> > APIs decode norms correctly for instance. This would be enough to for
> > instance consume stored fields etc. for reindexing or if a users are
> > aware do they norms decoding in the codec. I am happy to work on a
> > proposal how this would work. It would still enforce no writing or
> > anything like this. I am also all for putting such a reader into misc
> > and being experimental.
> >
> > simon
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> > For additional commands, e-mail: dev-h...@lucene.apache.org
> >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to