Yes please! Mike
On Tue, Oct 13, 2009 at 10:40 AM, Mark Miller <markrmil...@gmail.com> wrote: > I can trunk it once more if you'd like - its already pretty out of date :) > > If you havn't started anyway ... > > > Michael McCandless wrote: >> OK I will cut a branch & commit Mark's last patch onto it, unless >> anyone has objections soonish... >> >> I'll also branch (twig?) the back compat branch so we can commit the >> patch there as well. >> >> Mike >> >> On Mon, Oct 12, 2009 at 10:50 PM, Mark Miller <markrmil...@gmail.com> wrote: >> >>> SVN is about as good at merging branches as any of us are with a patch >>> and trunk unfortunately. But that can still be somewhat more convenient >>> than all these huge patches, with different people at different stages. >>> >>> Depends on how many people end up working on this though. Any more than >>> 2, and I think the branch has got to be worth it. >>> >>> From my perspective, it doesn't make any of the merging process any >>> easier - but it can be easier than juggling all these patches - you have >>> a central code base that can always be targeted for current merging. >>> >>> Michael Busch wrote: >>> >>>> I think it's supposed to work pretty good - though I have no personal >>>> experience with merging branches with svn. >>>> >>>> I think we should try it - then we'll know! :) >>>> >>>> Michael >>>> >>>> On 10/12/09 12:32 PM, Michael McCandless (JIRA) wrote: >>>> >>>>> [ >>>>> https://issues.apache.org/jira/browse/LUCENE-1458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12764799#action_12764799 >>>>> ] >>>>> >>>>> Michael McCandless commented on LUCENE-1458: >>>>> -------------------------------------------- >>>>> >>>>> bq. Shall we create a flexible-indexing branch and commit this? >>>>> >>>>> I think this is a good idea. >>>>> >>>>> But I haven't played heavily w/ svn& branching. EG if we branch >>>>> now, and trunk moves fast (which it still is w/ deprecation >>>>> removals), are we going to have conflicts? Or... is svn good about >>>>> merging branches? >>>>> >>>>> >>>>> >>>>>> Further steps towards flexible indexing >>>>>> --------------------------------------- >>>>>> >>>>>> Key: LUCENE-1458 >>>>>> URL: https://issues.apache.org/jira/browse/LUCENE-1458 >>>>>> Project: Lucene - Java >>>>>> Issue Type: New Feature >>>>>> Components: Index >>>>>> Affects Versions: 2.9 >>>>>> Reporter: Michael McCandless >>>>>> Assignee: Michael McCandless >>>>>> Priority: Minor >>>>>> Attachments: LUCENE-1458-back-compat.patch, >>>>>> LUCENE-1458-back-compat.patch, LUCENE-1458-back-compat.patch, >>>>>> LUCENE-1458-back-compat.patch, LUCENE-1458-back-compat.patch, >>>>>> LUCENE-1458-back-compat.patch, LUCENE-1458.patch, LUCENE-1458.patch, >>>>>> LUCENE-1458.patch, LUCENE-1458.patch, LUCENE-1458.patch, >>>>>> LUCENE-1458.patch, LUCENE-1458.patch, LUCENE-1458.patch, >>>>>> LUCENE-1458.patch, LUCENE-1458.patch, LUCENE-1458.patch, >>>>>> LUCENE-1458.tar.bz2, LUCENE-1458.tar.bz2, LUCENE-1458.tar.bz2, >>>>>> LUCENE-1458.tar.bz2, LUCENE-1458.tar.bz2, LUCENE-1458.tar.bz2, >>>>>> LUCENE-1458.tar.bz2 >>>>>> >>>>>> >>>>>> I attached a very rough checkpoint of my current patch, to get early >>>>>> feedback. All tests pass, though back compat tests don't pass due to >>>>>> changes to package-private APIs plus certain bugs in tests that >>>>>> happened to work (eg call TermPostions.nextPosition() too many times, >>>>>> which the new API asserts against). >>>>>> [Aside: I think, when we commit changes to package-private APIs such >>>>>> that back-compat tests don't pass, we could go back, make a branch on >>>>>> the back-compat tag, commit changes to the tests to use the new >>>>>> package private APIs on that branch, then fix nightly build to use the >>>>>> tip of that branch?o] >>>>>> There's still plenty to do before this is committable! This is a >>>>>> rather large change: >>>>>> * Switches to a new more efficient terms dict format. This still >>>>>> uses tii/tis files, but the tii only stores term& long offset >>>>>> (not a TermInfo). At seek points, tis encodes term& freq/prox >>>>>> offsets absolutely instead of with deltas delta. Also, tis/tii >>>>>> are structured by field, so we don't have to record field number >>>>>> in every term. >>>>>> . >>>>>> On first 1 M docs of Wikipedia, tii file is 36% smaller (0.99 MB >>>>>> -> 0.64 MB) and tis file is 9% smaller (75.5 MB -> 68.5 MB). >>>>>> . >>>>>> RAM usage when loading terms dict index is significantly less >>>>>> since we only load an array of offsets and an array of String (no >>>>>> more TermInfo array). It should be faster to init too. >>>>>> . >>>>>> This part is basically done. >>>>>> * Introduces modular reader codec that strongly decouples terms dict >>>>>> from docs/positions readers. EG there is no more TermInfo used >>>>>> when reading the new format. >>>>>> . >>>>>> There's nice symmetry now between reading& writing in the codec >>>>>> chain -- the current docs/prox format is captured in: >>>>>> {code} >>>>>> FormatPostingsTermsDictWriter/Reader >>>>>> FormatPostingsDocsWriter/Reader (.frq file) and >>>>>> FormatPostingsPositionsWriter/Reader (.prx file). >>>>>> {code} >>>>>> This part is basically done. >>>>>> * Introduces a new "flex" API for iterating through the fields, >>>>>> terms, docs and positions: >>>>>> {code} >>>>>> FieldProducer -> TermsEnum -> DocsEnum -> PostingsEnum >>>>>> {code} >>>>>> This replaces TermEnum/Docs/Positions. SegmentReader emulates the >>>>>> old API on top of the new API to keep back-compat. >>>>>> >>>>>> Next steps: >>>>>> * Plug in new codecs (pulsing, pfor) to exercise the modularity / >>>>>> fix any hidden assumptions. >>>>>> * Expose new API out of IndexReader, deprecate old API but emulate >>>>>> old API on top of new one, switch all core/contrib users to the >>>>>> new API. >>>>>> * Maybe switch to AttributeSources as the base class for TermsEnum, >>>>>> DocsEnum, PostingsEnum -- this would give readers API flexibility >>>>>> (not just index-file-format flexibility). EG if someone wanted >>>>>> to store payload at the term-doc level instead of >>>>>> term-doc-position level, you could just add a new attribute. >>>>>> * Test performance& iterate. >>>>>> >>>>>> >>>> --------------------------------------------------------------------- >>>> To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org >>>> For additional commands, e-mail: java-dev-h...@lucene.apache.org >>>> >>>> >>> -- >>> - Mark >>> >>> http://www.lucidimagination.com >>> >>> >>> >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org >>> For additional commands, e-mail: java-dev-h...@lucene.apache.org >>> >>> >>> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org >> For additional commands, e-mail: java-dev-h...@lucene.apache.org >> >> > > > -- > - Mark > > http://www.lucidimagination.com > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-dev-h...@lucene.apache.org > > --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org