I uploaded a test program to the jira issue that demonstrates the problem
I'm seeing.

Please let me know if you are able to reproduce the problem and whether you
think there's a workaround for it that doesn't involve a patch.

Thanks,
-- Tom


On Tue, Aug 25, 2015 at 12:51 PM, Aaron McCurry <[email protected]> wrote:

> On Mon, Aug 24, 2015 at 10:30 PM, Tom Hood <[email protected]> wrote:
>
> > Hi,
> >
> > There appears to be a bug where two rows are merging into one as a result
> > of doing separate calls to the Iface.mutate method using
> > RowMutationType.UPDATE_ROW and RecordMutationType.REPLACE_ENTIRE_RECORD.
> > (I can also see the problem using REPLACE_ROW and REPLACE_ENTIRE_RECORD
> > instead).
> >
> > For example, if the index has 2 rows with 1 record each that has a copy
> of
> > the rowId in cf.key:
> >   row A:   cf.key=A
> >   row B:   cf.key=B
> >
> > After an attempt to Iface.mutate row A with exactly the same data,
> > sometimes the result is:
> >   row A:  cf.key=A
> >   row B:  cf.key=B  cf.key=A
> >
> > instead of the expected result of a no-op.  The corruption is visible
> with
> > "blur get" and "blur query cf.key:B" and an Iface.fetchRow from java.
> >
> > For the above, the recordId is always "0" and the rowId is a UUID
> generated
> > from java UUID.randomUUID (although for my test I'm also using the same
> > UUIDs).
> >
> > I'm not setting a schema at all in my test program, so all the defaults
> for
> > analyzers, fieldless=true, etc.
> >
> > I do notice the following show up in the shard server log:  INFO ...
> > [thrift-processors1] search.PrimeDocCache: PrimeDoc for reader
> > [_k(4.3):C19/4] not stored, because count [13] and freq [16] do not
> match.
> >
> > Restarting blur doesn't seem to help.
> >
> > Blur version is 0.2.4.  Hadoop stack is CDH 5.1.0
> >
> > Cluster configuration is running 1 shard, 1 controller, 1 namenode all on
> > the same machine (redhat 6.3 Santiago).
> >
> > I have a fairly small test case that if I run repeatedly sometimes fails,
> > sometimes doesn't.  I run it after using blur shell to remove the old
> table
> > and create a new one with 1 shard.
> >
> > Although it isn't 100% reproducible, it seems to fail pretty often for
> me.
> > As I've typed the code in on a different network, I don't have the code
> for
> > you yet.
> >
> > Have you seen this kind of issue before?
> >
>
> I have not.
>
>
> >
> > Any suggestions for how to track it down?
> >
>
> Not sure yet, maybe we could reproduce it in the IndexManagerTest.  That's
> where most the the mutation test are located.
>
>
> >
> > Are there any commands you want me to run on the resulting table that
> might
> > yield some clues?
> >
>
> I don't know enough yet to suggest anything.  I have opened a jira ticket
> where we can track the issue.
>
> https://issues.apache.org/jira/browse/BLUR-441
>
> I will try to investigate ASAP.
>
> Aaron
>
>
> >
> > Thanks,
> > -- Tom
> >
>

Reply via email to