Hi Alexey,
Thank you very much for investigating this test and providing an easy repro
case. I just got around to spending some time on this due to David's
read-your-writes (RYW) patches adding to this test. I think we should try
to make this test reliable to be sure RYW doesn't have anomalies we aren't
aware of. I applied your (second) patch mentioned earlier in this thread
and was able to quickly reproduce the issues you described.
I suspect that the problem stems from the fact that in the C++ client,
session->Flush() is a no-op if an async flush is already in-progress. At
least, it looks to me like those are the current semantics. If I remove the
session->FlushAsync(null) from your patch then the test passes for me.
The problem with a non-blocking Flush() while an async flush is outstanding
that is that tablet_history_gc-itest relies on the following series of
operations in order to ensure it knows what the consistent set of values
should be for a particular timestamp:
FlushSessionOrDie(session);
}
SaveSnapshot(std::move(snapshot), clock_->Now());
So it waits until the data is flushed to the server and then captures the
current server timestamp as a key for its record of expected data values at
that timestamp. The above appears in all of the write code paths in the
test. If FlushSessionOrDie() is an async no-op, and the client has not
truly flushed outstanding data at the current time stamp, then the
"SaveSnapshot()" code path will think the latest writes are included in the
snapshot for the current time, but in actuality the old value will be
stored for that timestamp on the server.
This matches with the observed behavior based on running the (patched) test
with --vmodule=tablet_history_gc-itest=2 and seeing the following in the
log:
I1122 15:42:48.355686 15303 tablet_history_gc-itest.cc:166] Saving snapshot
at ts = P: 1000000000 usec, L: 1 (4096000000001)
I1122 15:42:48.355768 15303 tablet_history_gc-itest.cc:380] Starting round 1
...
I1122 15:42:48.355917 15303 tablet_history_gc-itest.cc:463] Updating row to
{ 511, 1535296252, 1535296251, NOT_DELETED }
I1122 15:42:48.355926 15303 tablet_history_gc-itest.cc:463] Updating row to
{ 511, 1708088659, 1708088658, NOT_DELETED }
...
I1122 15:42:48.975769 15303 tablet_history_gc-itest.cc:166] Saving snapshot
at ts = P: 1000000000 usec, L: 514 (4096000000514)
I1122 15:42:48.975796 15303 tablet_history_gc-itest.cc:496] Updated 512 rows
I1122 15:42:48.975807 15303 tablet_history_gc-itest.cc:380] Starting round 2
...
I1122 15:42:54.547406 15303 tablet_history_gc-itest.cc:213] Round 26:
Verifying snapshot scan for timestamp P: 1000000000 usec, L: 1559
(4096000001559)
../../src/kudu/integration-tests/tablet_history_gc-itest.cc:242: Failure
Value of: int_val
Actual: 1535296252
Expected: snap_iter->second.int_val
Which is: 1708088659
at row key 511
So I think we have a couple options:
1. Make KuduSession::Flush() "timeline-consistent" (block until all
outstanding flushes complete), or
2. Only use manual flushing in this test​
Personally, I think the current observed semantics of KuduSession::Flush()
are not ideal and I'd advocate that we do #1.
Mike