I uploaded a test program to the jira issue that demonstrates the problem I'm seeing.
Please let me know if you are able to reproduce the problem and whether you think there's a workaround for it that doesn't involve a patch. Thanks, -- Tom On Tue, Aug 25, 2015 at 12:51 PM, Aaron McCurry <[email protected]> wrote: > On Mon, Aug 24, 2015 at 10:30 PM, Tom Hood <[email protected]> wrote: > > > Hi, > > > > There appears to be a bug where two rows are merging into one as a result > > of doing separate calls to the Iface.mutate method using > > RowMutationType.UPDATE_ROW and RecordMutationType.REPLACE_ENTIRE_RECORD. > > (I can also see the problem using REPLACE_ROW and REPLACE_ENTIRE_RECORD > > instead). > > > > For example, if the index has 2 rows with 1 record each that has a copy > of > > the rowId in cf.key: > > row A: cf.key=A > > row B: cf.key=B > > > > After an attempt to Iface.mutate row A with exactly the same data, > > sometimes the result is: > > row A: cf.key=A > > row B: cf.key=B cf.key=A > > > > instead of the expected result of a no-op. The corruption is visible > with > > "blur get" and "blur query cf.key:B" and an Iface.fetchRow from java. > > > > For the above, the recordId is always "0" and the rowId is a UUID > generated > > from java UUID.randomUUID (although for my test I'm also using the same > > UUIDs). > > > > I'm not setting a schema at all in my test program, so all the defaults > for > > analyzers, fieldless=true, etc. > > > > I do notice the following show up in the shard server log: INFO ... > > [thrift-processors1] search.PrimeDocCache: PrimeDoc for reader > > [_k(4.3):C19/4] not stored, because count [13] and freq [16] do not > match. > > > > Restarting blur doesn't seem to help. > > > > Blur version is 0.2.4. Hadoop stack is CDH 5.1.0 > > > > Cluster configuration is running 1 shard, 1 controller, 1 namenode all on > > the same machine (redhat 6.3 Santiago). > > > > I have a fairly small test case that if I run repeatedly sometimes fails, > > sometimes doesn't. I run it after using blur shell to remove the old > table > > and create a new one with 1 shard. > > > > Although it isn't 100% reproducible, it seems to fail pretty often for > me. > > As I've typed the code in on a different network, I don't have the code > for > > you yet. > > > > Have you seen this kind of issue before? > > > > I have not. > > > > > > Any suggestions for how to track it down? > > > > Not sure yet, maybe we could reproduce it in the IndexManagerTest. That's > where most the the mutation test are located. > > > > > > Are there any commands you want me to run on the resulting table that > might > > yield some clues? > > > > I don't know enough yet to suggest anything. I have opened a jira ticket > where we can track the issue. > > https://issues.apache.org/jira/browse/BLUR-441 > > I will try to investigate ASAP. > > Aaron > > > > > > Thanks, > > -- Tom > > >
