Soumava Ghosh created CASSANDRA-5830:
----------------------------------------
Summary: Paxos loops endlessly due to faulty condition check
Key: CASSANDRA-5830
URL: https://issues.apache.org/jira/browse/CASSANDRA-5830
Project: Cassandra
Issue Type: Bug
Affects Versions: 2.0 beta 2
Reporter: Soumava Ghosh
Following is the code segment (StorageProxy.java:328) which causes the issue:
Start is the start time of the paxos, is always less than the current system
time, and therefore the negative difference is always less than the timeout.
private static UUID beginAndRepairPaxos(long start, ByteBuffer key, CFMetaData
metadata, List<InetAddress> liveEndpoints, int requiredParticipants,
ConsistencyLevel consistencyForPaxos)
throws WriteTimeoutException
{
long timeout =
TimeUnit.MILLISECONDS.toNanos(DatabaseDescriptor.getCasContentionTimeout());
PrepareCallback summary = null;
while (start - System.nanoTime() < timeout)
{
long ballotMillis = summary == null
? System.currentTimeMillis()
: Math.max(System.currentTimeMillis(), 1 +
UUIDGen.unixTimestamp(summary.inProgressCommit.ballot));
UUID ballot = UUIDGen.getTimeUUID(ballotMillis);
Here, the paxos gets stuck when PREPARE returns 'true' but with
inProgressCommit. The code in StorageProxy.java:beginAndRepairPaxos() then
tries to issue a PREPARE and COMMIT for the inProgressCommit, and if it
repeatedly receives 'false' as a PREPARE_RESPONSE it gets stuck in an endless
loop until PREPARE_RESPONSE is true.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira