Erik Forsberg created CASSANDRA-8682:
----------------------------------------
Summary: BulkRecordWriter ends up streaming with non-unique
session IDs on large hadoop cluster
Key: CASSANDRA-8682
URL: https://issues.apache.org/jira/browse/CASSANDRA-8682
Project: Cassandra
Issue Type: Bug
Components: Hadoop
Reporter: Erik Forsberg
Attachments: cassandra-1.2-bulkrecordwriter-sessionid.patch
We use BulkOutputFormat extensively to load data from hadoop to Cassandra. We
are currently running Cassandra 1.2.18, but are planning an upgrade of
Cassandra to 2.0.X, possibly 2.1.X.
With Cassandra 1.2 we have problems with the streaming session IDs getting
duplicated when multiple (20+) java processes start to do streaming at the same
time. On the receiving cassandra node, having the same session ID actually
correspond to different sending processing would confuse things a lot, leading
to aborted connections.
This would not happen for every process, but often enough to be a problem in
production environment. So it was a bit tricky to test.
Suspecting this have to do with how UUIDs are generated on the sending (hadoop
side). With 20+ processes being started concurrently, the clockSeqAndNode part
of the uuid1 probably ended up being exactly the same on all 20 processes.
I wrote a patch which I unfortunately never submitted at the time, but it's
attached to this issue. The patch constructs a UUID from the map or reduce task
ID, which is guaranteed to be unique per hadoop cluster.
I suspect we're going to face the same issue on Cassandra 2.0 and 2.1, even
after the rewrite of the streaming subsystem. Please correct me if I'm wrong,
i.e. if there's something in the new code that will make this a non-issue.
Now the question is how to address this problem. Possible options that I see
after some code reading:
1. Update patch to apply on 2.0 and 2.1, using same method (generating UUID
from hadoop task ID)
2. Modify UUIDGen code to use java process pid as clockSeq instead of random
number. However, getting the pid in java seems less than simple (and remember
that this is code that runs on the hadoop size of things, not inside cassandra
daemon)
3. This patch might help:
{noformat}
diff --git a/src/java/org/apache/cassandra/utils/UUIDGen.java
b/src/java/org/apache/cassandra/utils/UUIDGen.java
index f385744..ae253ab 100644
--- a/src/java/org/apache/cassandra/utils/UUIDGen.java
+++ b/src/java/org/apache/cassandra/utils/UUIDGen.java
@@ -234,7 +234,7 @@ public class UUIDGen
private static long makeClockSeqAndNode()
{
- long clock = new Random(System.currentTimeMillis()).nextLong();
+ long clock = new Random().nextLong();
long lsb = 0;
lsb |= 0x8000000000000000L; // variant (2 bits)
{noformat}
..but I don't know the reason System.currentTimeMillis() is being used.
Opinions?
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)