I can trunk it once more if you'd like - its already pretty out of date :) If you havn't started anyway ...
Michael McCandless wrote: > OK I will cut a branch & commit Mark's last patch onto it, unless > anyone has objections soonish... > > I'll also branch (twig?) the back compat branch so we can commit the > patch there as well. > > Mike > > On Mon, Oct 12, 2009 at 10:50 PM, Mark Miller <markrmil...@gmail.com> wrote: > >> SVN is about as good at merging branches as any of us are with a patch >> and trunk unfortunately. But that can still be somewhat more convenient >> than all these huge patches, with different people at different stages. >> >> Depends on how many people end up working on this though. Any more than >> 2, and I think the branch has got to be worth it. >> >> From my perspective, it doesn't make any of the merging process any >> easier - but it can be easier than juggling all these patches - you have >> a central code base that can always be targeted for current merging. >> >> Michael Busch wrote: >> >>> I think it's supposed to work pretty good - though I have no personal >>> experience with merging branches with svn. >>> >>> I think we should try it - then we'll know! :) >>> >>> Michael >>> >>> On 10/12/09 12:32 PM, Michael McCandless (JIRA) wrote: >>> >>>> [ >>>> https://issues.apache.org/jira/browse/LUCENE-1458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12764799#action_12764799 >>>> ] >>>> >>>> Michael McCandless commented on LUCENE-1458: >>>> -------------------------------------------- >>>> >>>> bq. Shall we create a flexible-indexing branch and commit this? >>>> >>>> I think this is a good idea. >>>> >>>> But I haven't played heavily w/ svn& branching. EG if we branch >>>> now, and trunk moves fast (which it still is w/ deprecation >>>> removals), are we going to have conflicts? Or... is svn good about >>>> merging branches? >>>> >>>> >>>> >>>>> Further steps towards flexible indexing >>>>> --------------------------------------- >>>>> >>>>> Key: LUCENE-1458 >>>>> URL: https://issues.apache.org/jira/browse/LUCENE-1458 >>>>> Project: Lucene - Java >>>>> Issue Type: New Feature >>>>> Components: Index >>>>> Affects Versions: 2.9 >>>>> Reporter: Michael McCandless >>>>> Assignee: Michael McCandless >>>>> Priority: Minor >>>>> Attachments: LUCENE-1458-back-compat.patch, >>>>> LUCENE-1458-back-compat.patch, LUCENE-1458-back-compat.patch, >>>>> LUCENE-1458-back-compat.patch, LUCENE-1458-back-compat.patch, >>>>> LUCENE-1458-back-compat.patch, LUCENE-1458.patch, LUCENE-1458.patch, >>>>> LUCENE-1458.patch, LUCENE-1458.patch, LUCENE-1458.patch, >>>>> LUCENE-1458.patch, LUCENE-1458.patch, LUCENE-1458.patch, >>>>> LUCENE-1458.patch, LUCENE-1458.patch, LUCENE-1458.patch, >>>>> LUCENE-1458.tar.bz2, LUCENE-1458.tar.bz2, LUCENE-1458.tar.bz2, >>>>> LUCENE-1458.tar.bz2, LUCENE-1458.tar.bz2, LUCENE-1458.tar.bz2, >>>>> LUCENE-1458.tar.bz2 >>>>> >>>>> >>>>> I attached a very rough checkpoint of my current patch, to get early >>>>> feedback. All tests pass, though back compat tests don't pass due to >>>>> changes to package-private APIs plus certain bugs in tests that >>>>> happened to work (eg call TermPostions.nextPosition() too many times, >>>>> which the new API asserts against). >>>>> [Aside: I think, when we commit changes to package-private APIs such >>>>> that back-compat tests don't pass, we could go back, make a branch on >>>>> the back-compat tag, commit changes to the tests to use the new >>>>> package private APIs on that branch, then fix nightly build to use the >>>>> tip of that branch?o] >>>>> There's still plenty to do before this is committable! This is a >>>>> rather large change: >>>>> * Switches to a new more efficient terms dict format. This still >>>>> uses tii/tis files, but the tii only stores term& long offset >>>>> (not a TermInfo). At seek points, tis encodes term& freq/prox >>>>> offsets absolutely instead of with deltas delta. Also, tis/tii >>>>> are structured by field, so we don't have to record field number >>>>> in every term. >>>>> . >>>>> On first 1 M docs of Wikipedia, tii file is 36% smaller (0.99 MB >>>>> -> 0.64 MB) and tis file is 9% smaller (75.5 MB -> 68.5 MB). >>>>> . >>>>> RAM usage when loading terms dict index is significantly less >>>>> since we only load an array of offsets and an array of String (no >>>>> more TermInfo array). It should be faster to init too. >>>>> . >>>>> This part is basically done. >>>>> * Introduces modular reader codec that strongly decouples terms dict >>>>> from docs/positions readers. EG there is no more TermInfo used >>>>> when reading the new format. >>>>> . >>>>> There's nice symmetry now between reading& writing in the codec >>>>> chain -- the current docs/prox format is captured in: >>>>> {code} >>>>> FormatPostingsTermsDictWriter/Reader >>>>> FormatPostingsDocsWriter/Reader (.frq file) and >>>>> FormatPostingsPositionsWriter/Reader (.prx file). >>>>> {code} >>>>> This part is basically done. >>>>> * Introduces a new "flex" API for iterating through the fields, >>>>> terms, docs and positions: >>>>> {code} >>>>> FieldProducer -> TermsEnum -> DocsEnum -> PostingsEnum >>>>> {code} >>>>> This replaces TermEnum/Docs/Positions. SegmentReader emulates the >>>>> old API on top of the new API to keep back-compat. >>>>> >>>>> Next steps: >>>>> * Plug in new codecs (pulsing, pfor) to exercise the modularity / >>>>> fix any hidden assumptions. >>>>> * Expose new API out of IndexReader, deprecate old API but emulate >>>>> old API on top of new one, switch all core/contrib users to the >>>>> new API. >>>>> * Maybe switch to AttributeSources as the base class for TermsEnum, >>>>> DocsEnum, PostingsEnum -- this would give readers API flexibility >>>>> (not just index-file-format flexibility). EG if someone wanted >>>>> to store payload at the term-doc level instead of >>>>> term-doc-position level, you could just add a new attribute. >>>>> * Test performance& iterate. >>>>> >>>>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org >>> For additional commands, e-mail: java-dev-h...@lucene.apache.org >>> >>> >> -- >> - Mark >> >> http://www.lucidimagination.com >> >> >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org >> For additional commands, e-mail: java-dev-h...@lucene.apache.org >> >> >> > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-dev-h...@lucene.apache.org > > -- - Mark http://www.lucidimagination.com --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org