In case anyone is following this, here is an update: I was able to narrow it down to Cassandra-Cassandra link. Storage proxy latency depends on size of the key. The larger amount of data (per key) is transfered the larger latency is. No surprise here. Client connects to a demons "A" and sends key-value, "A" accept thrift message, de-serialize it to an object, sees that key belongs to demons "B", serialize it to bytes once again (internal format now) and invoke MessagingService, which in turn writes to a socket. As soon as "B" delivers write-acknowledgment over a different connection, the client call is let go. Cassandra's MessagingService utilizes java nio to connect to other cassandra daemons, all connections are uni-directionals. So in theory it should be very fast. But it's not.
What does look suspicious is certain network usage cap, only ~4% of the 1Gbps link is used regardless of "value" size. With smaller value I get a better throughput, with larger (200Kb) - worse. As a temp workaround I see that client might be held responsible to identifying what cassandra instance it should send a key to. On 200kb value it's ~10 times faster. On Thu, Oct 1, 2009 at 6:51 PM, Igor Katkov <[email protected]> wrote: > Hi, > > I have the following puzzle: > Storage proxy write latency ~235ms > CF write latency <1 ms > > I have 3 nodes in the cluster, Cassandra v.0.4. Tokens evenly distributed. > The client connects to a node and inserts a key with ConsistencyLevel.ONE > If it happen to be a local write operation is fast, same speed as in one > node setup. JMX shows write latency <1 ms > If it happens to be a remote insert StorageProxy sends it to a proper node. > This operation is slow. JMX shows write latency ~ 235ms. > In the same time, on remote node JMX shows same <1ms write latency. So it's > not remote node being sluggish, it's something else. > There are no pending tasks on remote node - JMX counters are always zero, > network is 1Gb, idle. So I can't blame it. > > > I profiled Cassandra server in JProfiler, could not find a thing. All this > extra time is spent inside QuorumResponseHandler waiting for the condition > to signal. Which should happen as soon as response is received. > > There is one pooled TCP connection open to remote host. Hardly a > bottleneck, ThreadPoolExecutors looks OK. > > Any ideas why write latency it is so high? >
