On Fri, Sep 26, 2014 at 7:07 PM, Michael McCandless < luc...@mikemccandless.com> wrote:
> Sorry I can't make heads or tails of what you are saying here ... can > you maybe make a small test case that fails with "ant test"? Boil it > down as much as possible... > Sure. I'm really sorry for being so confusing. I changed constant https://github.com/m-khl/lucene-merge-visibility/commit/a4a01c2c91d9c30850602b8dddf23de5363c4851#diff-86ebfbf440fe69ee36a52705cb92b824R44 to make it fail. the branch *reader-vs-merge <https://github.com/m-khl/lucene-merge-visibility/tree/reader-vs-merge> *at https://github.com/m-khl/lucene-merge-visibility/tree/reader-vs-merge in lucene/core there is a failed test $> ant test -Dtestcase=TestNumDValUpdVsReaderVisibility it's verbose, because it uses sysout as infostream. [junit4] FAILURE 2.40s | TestNumDValUpdVsReaderVisibility.testSimple <<< [junit4] > Throwable #1: java.lang.AssertionError: failed on id:doc-18 expected:<17> but was:<18> [junit4] > at __randomizedtesting.SeedInfo.seed([73A18231908F4ADC:4B12A6CFB77C9E0D]:0) [junit4] > at org.apache.lucene.index.TestNumDValUpdVsReaderVisibility.testSimple(TestNumDValUpdVsReaderVisibility.java:134) > > The gist seems to be if you use an NRT reader something fails, but if > you instead open a new reader, that something passes? I don't use NTR, and perhaps it's a solution. I just don't know how to do that. Note: closing writer, open reader - works (but I suppose it's slow); just committing and reopening reader - it fails; > But what > exactly is failing? > - let I have merge factor 10 and SerialMergeSceduler. - I did 9 commits already and have 9 segments in the index - I add a few docs and commit - 10th commit triggers merge synchronously, it's done. - now if I reopen reader it see 10 unmerged segments (merged single segment index, isn't visible for reopen) /*test FAILS*/ - but if I fully close writer&reader and open reader, I've got single segment merged index. /*test PASS */ - usually such behavior gets no probs, it's reasonable, and fine. - but I do a mad thing - I use that reader (with 10 segments) to get docnum and write it as a docvalue; - after I commit only docvalues update (no docs update) and reopen reader, I've got single segment index, which was already written by merge at the previous commit. - and here is a problem because a docnum obtained at 10 segments index, doesn't match to docnum at single segment index (there was a deletion) > And what is a "solid" segment here? > I meant an index contains of single segment, at contrast from index contains of many ones. Thank you! > > Mike McCandless > > http://blog.mikemccandless.com > > > On Thu, Sep 25, 2014 at 6:00 PM, Mikhail Khludnev > <mkhlud...@griddynamics.com> wrote: > > Hello Mike! > > > > Thanks for your attention. > > I pushed the mad case at > > > https://github.com/m-khl/lucene-merge-visibility/commit/fa2d60be5b13eb57e0527c843119cf62cfa83a7d#diff-86ebfbf440fe69ee36a52705cb92b824R120 > > > > it does the following > > > > - writes a pair of doc > > - commit > > - reopen reader, searches for one of them > > - update this doc with its' docnum (I know it's weird, but I should work > if > > reopened reader sees that update) > > - commit this DV update > > - search that doc and check the written doc val. > > it passes if hardReopenBeforeDVUpdate=true and fails otherwise > > > > I know that changing docnum is natural, but I expect it doesnt change > while > > I update docvals. > > here how it flips: > > at the commit after doc update we have many segments > > > > now checkpoint "_0(6.0.0):C2/1:delGen=1:fieldInfosGen=1:dvGen=1 > > _1(6.0.0):C2:fieldInfosGen=1:dvGen=1 _2(6.0.0):C2: > > commit: wrote segments file "segments_j" > > > > but also there is a solid segment, which is merged but haven't > > committed/published > > after commitMerge: _a(6.0.0):c19 > > > > and after DV update commit we have that solid segment visible > > > > now checkpoint "_a(6.0.0):c19:fieldInfosGen=1:dvGen=1" [1 segments ; > > isCommit = true] > > IFD 0 [Thu Sep 25 23:56:22 SAST 2014; > > > TEST-TestNumDValUpdVsReaderVisibility.testSimple-seed#[6131CF35B3A45FC3]]: > > deleteCommits: now decRef commit "segments_j" > > ... > > wrote segments file "segments_k" > > > > I'm using SerialMergeScheduler, and expect to see single solid segment > after > > I commit document updates and it triggers the merge. > > How I can reopen reader which sees it? > > Thanks > > > > > > On Wed, Sep 24, 2014 at 10:07 PM, Michael McCandless > > <luc...@mikemccandless.com> wrote: > >> > >> I don't understand what's actually happening / going wrong here. > >> > >> Maybe you can make a test case / give more details? > >> > >> What assertions are broken? Why is it bad if SMS does a merge before > >> you reopen? Why are you using SMS :) > >> > >> Mike McCandless > >> > >> http://blog.mikemccandless.com > >> > >> On Mon, Sep 22, 2014 at 6:00 PM, Mikhail Khludnev > >> <mkhlud...@griddynamics.com> wrote: > >> > Hello! > >> > I'm in trouble with Lucene Index Writer. I'm benchmarking some > algorithm > >> > which might seem like NRT-case, but I'm not sure that I need it > >> > particularly. The overall problem is to writing "join index" (column > >> > holds > >> > docnums) via updating binary docvalues after commit. i.e.: > >> > - update docs > >> > - commit > >> > - read docs (openIfChanged() before ) > >> > - updateDocVals > >> > - commit > >> > > >> > It's clunky but it works, until guess what happens... merge.Oh my. > >> > > >> > Once a time I have segments > >> > segments_ec:2090 _7c(5.0):C117/8:delGen=8:.... > >> > _7j(5.0):C1:fieldInfosGen=1:dvGen=1 _7k(5.0):C1) > >> > > >> > I apply one update and trigger commit, as a result I have: > >> > segments_ee:2102 _7c(5.0):C117/9:delGen=9:.. > >> > _7k(5.0):C1:fieldInfosGen=1:dvGen=1 _7l(5.0):C1) > >> > > >> > however, somewhere inside of the this commit call, pretty > >> > SerialMergeScheduler bakes the single solid segment > >> > _7m(5.0):C117 > >> > however, it wasn't exposed in via any segments file so far. > >> > > >> > And now I get into trouble: > >> > if I call DR.openIfChanged(segments_ec) (even after IW.waitMerges()), > >> > I've > >> > got segments_ee that's fairly reasonable, to keep it incremental and > >> > fast. > >> > but if I use that IndexWriter, it applies new updates on top of that > >> > merged > >> > one (_7m(5.0):C117), not on segments_ee. And it broke my assertions. I > >> > rather need to open reader of that merged _7m(5.0):C117, which IW > keeps > >> > somewhere internally, and it's better to do if fancy&incremental. If > you > >> > can > >> > point me on how NRT can solve I'd happy to switch on it. > >> > > >> > Incredibly thank you for your time!!! > >> > > >> > -- > >> > Sincerely yours > >> > Mikhail Khludnev > >> > Principal Engineer, > >> > Grid Dynamics > >> > > >> > > >> > >> --------------------------------------------------------------------- > >> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > >> For additional commands, e-mail: dev-h...@lucene.apache.org > >> > > > > > > > > -- > > Sincerely yours > > Mikhail Khludnev > > Principal Engineer, > > Grid Dynamics > > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: dev-h...@lucene.apache.org > > -- Sincerely yours Mikhail Khludnev Principal Engineer, Grid Dynamics <http://www.griddynamics.com> <mkhlud...@griddynamics.com>