Hi all,

tl:dr 
The Titan 0.5.4 cassandrathrift client + C* 2.0.8/2.2.6 create massive amounts 
of network packets for multiget_slice queries. Is there a way to avoid the 
“packet storm”?


Details...

We are using Titan 0.5.4 with its cassandrathrift storage engine to connect to 
a single node cluster running C* 2.2.6 (we also tried 2.0.8, which is the 
version in Titans dependencies). When moving to a multi-datacenter setup with 
the client in one DC and the C* server in the other, we ran into the problem 
that response times from Cassandra/the graph became unacceptable (>30s vs. 0.2s 
within datacenter). Looking at the network traffic we saw that the client and 
server exchange a massive number of very small packets.
The user action we were tracing yields three packets of type “REPLY 
multiget_slice”. Per such a reply we see about 1’000 of packet pairs like this 
going back and forth between client and server:

968   09:45:55.354613   x.x.x.30 x.x.x.98 TCP   181   54406 → 9160 [PSH, ACK] 
Seq=53709 Ack=39558 Win=1002 Len=115 TSval=4169130400 TSecr=4169119527
0000   00 50 56 a7 d6 0d 00 0c 29 d1 a4 5e 08 00 45 00  .PV.....)..^..E.
0010   00 a7 e3 6d 40 00 40 06 fe 3c ac 13 00 1e ac 13  ...m@.@..<......
0020   00 62 d4 86 23 c8 2c 30 4e 45 1b 4b 0b 55 80 18  .b..#.,0NE.K.U..
0030   03 ea 59 40 00 00 01 01 08 0a f8 7f e1 a0 f8 7f  ..Y@............
0040   b7 27 00 00 00 6f 80 01 00 01 00 00 00 0e 6d 75  .'...o........mu
0050   6c 74 69 67 65 74 5f 73 6c 69 63 65 00 00 3a 38  ltiget_slice..:8
0060   0f 00 01 0b 00 00 00 01 00 00 00 08 00 00 00 00  ................
0070   00 00 ab 00 0c 00 02 0b 00 03 00 00 00 09 65 64  ..............ed
0080   67 65 73 74 6f 72 65 00 0c 00 03 0c 00 02 0b 00  gestore.........
0090   01 00 00 00 02 72 c0 0b 00 02 00 00 00 02 72 c1  .....r........r.
00a0   02 00 03 00 08 00 04 7f ff ff ff 00 00 08 00 04  ................
00b0   00 00 00 01 00                                   .....

969   09:45:55.354825   x.x.x.98 x.x.x.30 TCP   123   9160 → 54406 [PSH, ACK] 
Seq=39558 Ack=53824 Win=1540 Len=57 TSval=4169119546 TSecr=4169130400
0000   00 0c 29 d1 a4 5e 00 50 56 a7 d6 0d 08 00 45 00  ..)..^.PV.....E.
0010   00 6d 19 dd 40 00 40 06 c8 07 ac 13 00 62 ac 13  .m..@.@......b..
0020   00 1e 23 c8 d4 86 1b 4b 0b 55 2c 30 4e b8 80 18  ..#....K.U,0N...
0030   06 04 3b d6 00 00 01 01 08 0a f8 7f b7 3a f8 7f  ..;..........:..
0040   e1 a0 00 00 00 35 80 01 00 02 00 00 00 0e 6d 75  .....5........mu
0050   6c 74 69 67 65 74 5f 73 6c 69 63 65 00 00 3a 38  ltiget_slice..:8
0060   0d 00 00 0b 0f 00 00 00 01 00 00 00 08 00 00 00  ................
0070   00 00 00 ab 00 0c 00 00 00 00 00                 ………..

With very few exceptions all packets have the exact same length of 181 and 123 
bytes respectively. The overall response time of the graph query grows approx. 
linearly with the network latency.
As even “normal” internet network latencies render the setup useless I assume 
we are doing something wrong. Is there a way to avoid that storm of small 
packets by configuration? Or is Titan’s cassandrathrift storage backend to 
blame for this? 


Thanks in advance!
Ralf

Reply via email to