On Fri, Sep 26, 2014 at 7:07 PM, Michael McCandless <
luc...@mikemccandless.com> wrote:

> Sorry I can't make heads or tails of what you are saying here ... can
> you maybe make a small test case that fails with "ant test"?  Boil it
> down as much as possible...
>

Sure. I'm really sorry for being so confusing.
I changed constant
https://github.com/m-khl/lucene-merge-visibility/commit/a4a01c2c91d9c30850602b8dddf23de5363c4851#diff-86ebfbf440fe69ee36a52705cb92b824R44
to make it fail.
the branch *reader-vs-merge
<https://github.com/m-khl/lucene-merge-visibility/tree/reader-vs-merge>  *at
https://github.com/m-khl/lucene-merge-visibility/tree/reader-vs-merge
in lucene/core there is a failed test
$> ant test -Dtestcase=TestNumDValUpdVsReaderVisibility

it's verbose, because it uses sysout as infostream.
   [junit4] FAILURE 2.40s | TestNumDValUpdVsReaderVisibility.testSimple <<<
   [junit4]    > Throwable #1: java.lang.AssertionError: failed on
id:doc-18 expected:<17> but was:<18>
   [junit4]    >     at
__randomizedtesting.SeedInfo.seed([73A18231908F4ADC:4B12A6CFB77C9E0D]:0)
   [junit4]    >     at
org.apache.lucene.index.TestNumDValUpdVsReaderVisibility.testSimple(TestNumDValUpdVsReaderVisibility.java:134)




>
> The gist seems to be if you use an NRT reader something fails, but if
> you instead open a new reader, that something passes?

I don't use NTR, and perhaps it's a solution. I just don't know how to do
that.
Note: closing writer, open reader - works (but I suppose it's slow); just
committing and reopening reader - it fails;

> But what
> exactly is failing?
>
- let I have merge factor 10 and SerialMergeSceduler.
- I did 9 commits already and have 9 segments in the index
- I add a few docs and commit
- 10th commit triggers merge synchronously, it's done.
- now if I reopen reader it see 10 unmerged segments (merged single segment
index, isn't visible for reopen) /*test FAILS*/
- but if I fully close writer&reader and open reader, I've got single
segment merged index.    /*test PASS */

- usually such behavior gets no probs, it's reasonable, and fine.
- but I do a mad thing
- I use that reader (with 10 segments) to get docnum and write it as a
docvalue;
- after I commit only docvalues update (no docs update) and reopen reader,
I've got single segment index, which was already written by merge at the
previous commit.
- and here is a problem because a docnum obtained at 10 segments index,
doesn't match to docnum at single segment index (there was a deletion)


> And what is a "solid" segment here?
>
I meant an index contains of single segment, at contrast from index
contains of many ones.

Thank you!

>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
>
> On Thu, Sep 25, 2014 at 6:00 PM, Mikhail Khludnev
> <mkhlud...@griddynamics.com> wrote:
> > Hello Mike!
> >
> > Thanks for your attention.
> > I pushed the mad case at
> >
> https://github.com/m-khl/lucene-merge-visibility/commit/fa2d60be5b13eb57e0527c843119cf62cfa83a7d#diff-86ebfbf440fe69ee36a52705cb92b824R120
> >
> > it does the following
> >
> > - writes a pair of doc
> > - commit
> > - reopen reader, searches for one of them
> > - update this doc with its' docnum (I know it's weird, but I should work
> if
> > reopened reader sees that update)
> > - commit this DV update
> > - search that doc and check the written doc val.
> > it passes if hardReopenBeforeDVUpdate=true and fails otherwise
> >
> > I know that changing docnum is natural, but I expect it doesnt change
> while
> > I update docvals.
> > here how it flips:
> > at the commit after doc update we have many segments
> >
> >  now checkpoint "_0(6.0.0):C2/1:delGen=1:fieldInfosGen=1:dvGen=1
> > _1(6.0.0):C2:fieldInfosGen=1:dvGen=1 _2(6.0.0):C2:
> > commit: wrote segments file "segments_j"
> >
> > but also there is a solid segment, which is merged but haven't
> > committed/published
> > after commitMerge: _a(6.0.0):c19
> >
> > and after DV update commit we have that solid segment visible
> >
> > now checkpoint "_a(6.0.0):c19:fieldInfosGen=1:dvGen=1" [1 segments ;
> > isCommit = true]
> > IFD 0 [Thu Sep 25 23:56:22 SAST 2014;
> >
> TEST-TestNumDValUpdVsReaderVisibility.testSimple-seed#[6131CF35B3A45FC3]]:
> > deleteCommits: now decRef commit "segments_j"
> > ...
> > wrote segments file "segments_k"
> >
> > I'm using SerialMergeScheduler, and expect to see single solid segment
> after
> > I commit document updates and it triggers the merge.
> > How I can reopen reader which sees it?
> > Thanks
> >
> >
> > On Wed, Sep 24, 2014 at 10:07 PM, Michael McCandless
> > <luc...@mikemccandless.com> wrote:
> >>
> >> I don't understand what's actually happening / going wrong here.
> >>
> >> Maybe you can make a test case / give more details?
> >>
> >> What assertions are broken?  Why is it bad if SMS does a merge before
> >> you reopen?  Why are you using SMS :)
> >>
> >> Mike McCandless
> >>
> >> http://blog.mikemccandless.com
> >>
> >> On Mon, Sep 22, 2014 at 6:00 PM, Mikhail Khludnev
> >> <mkhlud...@griddynamics.com> wrote:
> >> > Hello!
> >> > I'm in trouble with Lucene Index Writer. I'm benchmarking some
> algorithm
> >> > which might seem like NRT-case, but I'm not sure that I need it
> >> > particularly. The overall problem is to writing "join index" (column
> >> > holds
> >> > docnums) via updating binary docvalues after commit. i.e.:
> >> >  - update docs
> >> >  - commit
> >> >  - read docs (openIfChanged() before )
> >> >  - updateDocVals
> >> >  - commit
> >> >
> >> > It's clunky but it works, until guess what happens... merge.Oh my.
> >> >
> >> > Once a time I have segments
> >> > segments_ec:2090 _7c(5.0):C117/8:delGen=8:....
> >> > _7j(5.0):C1:fieldInfosGen=1:dvGen=1 _7k(5.0):C1)
> >> >
> >> > I apply one update and trigger commit, as a result I have:
> >> > segments_ee:2102 _7c(5.0):C117/9:delGen=9:..
> >> > _7k(5.0):C1:fieldInfosGen=1:dvGen=1 _7l(5.0):C1)
> >> >
> >> > however, somewhere inside of the this commit call, pretty
> >> > SerialMergeScheduler bakes the single solid segment
> >> > _7m(5.0):C117
> >> > however, it wasn't exposed in via any segments file so far.
> >> >
> >> > And now I get into trouble:
> >> > if I call DR.openIfChanged(segments_ec) (even after IW.waitMerges()),
> >> > I've
> >> > got segments_ee that's fairly reasonable, to keep it incremental and
> >> > fast.
> >> > but if I use that IndexWriter, it applies new updates on top of that
> >> > merged
> >> > one (_7m(5.0):C117), not on segments_ee. And it broke my assertions. I
> >> > rather need to open reader of that merged _7m(5.0):C117, which IW
> keeps
> >> > somewhere internally, and it's better to do if fancy&incremental. If
> you
> >> > can
> >> > point me on how NRT can solve I'd happy to switch on it.
> >> >
> >> > Incredibly thank you for your time!!!
> >> >
> >> > --
> >> > Sincerely yours
> >> > Mikhail Khludnev
> >> > Principal Engineer,
> >> > Grid Dynamics
> >> >
> >> >
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> >> For additional commands, e-mail: dev-h...@lucene.apache.org
> >>
> >
> >
> >
> > --
> > Sincerely yours
> > Mikhail Khludnev
> > Principal Engineer,
> > Grid Dynamics
> >
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>


-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

<http://www.griddynamics.com>
<mkhlud...@griddynamics.com>

Reply via email to