It's also strange that the first two rows have the same value for c79. That is extremely unlikely. I can dig in more tomorrow.
- Dan On Mon, Oct 3, 2016 at 10:53 PM, Dan Burkert <[email protected]> wrote: > The first three rows (including the out of order row) all fall in the same > range partition, so the issue is likely that the intra-tablet scan returned > out of order results (as opposed to the client scanning tablets out of > order). I'm under the same impression about SetFaultTolerant(), which is > why the test explicitly sets it. How often is this happening? Back when > this test was committed a few months ago I ran it a few thousand times and > never saw anything like this. > > On Mon, Oct 3, 2016 at 10:35 PM, Todd Lipcon <[email protected]> wrote: > >> Hey Dan (+CC dev in case anyone else knows about this too) >> >> I'm debugging some flakiness in alter_table-randomized-test, and ti seems >> like it's failing because the verification scan is returning some out of >> order rows, despite using "SetFaultTolerant()". Granted, fault tolerance >> isn't publicly guaranteed to return rows in order, but I was under the >> impression that, with range partitioned tablets, it would always do so. >> >> The scan result I'm seeing has the following sequence within it: >> >> (int32 key=537424064, int32 c945=NULL, int32 c79=234639860, int32 >> c990=NULL) >> >>>> OUT OF ORDER ROW >> (int32 key=552025439, int32 c945=NULL, int32 c79=234639860, int32 >> c990=NULL) >> >>>> BACK TO NORMAL ORDER >> (int32 key=539314778, int32 c945=1708089980, int32 c79=-878787336, int32 >> c990=829302644) >> (int32 key=541817227, int32 c945=2064952224, int32 c79=2064952224, int32 >> c990=NULL) >> (int32 key=546056206, int32 c945=26527696, int32 c79=26527696, int32 >> c990=26527696) >> (int32 key=601960253, int32 c945=NULL, int32 c79=1088757503, int32 >> c990=NULL) >> (int32 key=677154987, int32 c945=823764490, int32 c79=823764490, int32 >> c990=823764490) >> >> The prior alter was: >> I1004 05:17:48.192611 28113 alter_table-randomized-test.cc:481] Dropping >> range partition: [805306356, 872415219) resulting partitions: (134217726, >> 201326589], (268435452, 335544315], (335544315, 402653178], (402653178, >> 469762041], (536870904, 603979767], (671088630, 738197493], (738197493, >> 805306356], (939524082, 1006632945], (1006632945, 1073741808], (1275068397, >> 1342177260], (1342177260, 1409286123], (1409286123, 1476394986], >> (1610612712, 1677721575], (1879048164, 1946157027], (2013265890, >> 2080374753], (2080374753, 2147483616) >> I1004 05:17:48.193013 28113 alter_table-randomized-test.cc:406] >> Committing Alterations >> >> The whole log is available here: >> https://gist.githubusercontent.com/toddlipcon/466976caf973f4 >> 96885da9efc2f7246c/raw/f9baf418dad4ad07f33961b131c86e8480381 >> 5a8/alter_table-randomized-test.txt >> >> Any ideas what might be causing this out-of-order result? Is the test >> making some incorrect assumptions or might we have a bug? >> >> -Todd >> >> >> >> >> >> >> -- >> Todd Lipcon >> Software Engineer, Cloudera >> > >
