Re: IN versus multiple asynchronous queries

2014-10-07 Thread Tyler Hobbs
Also note that with an IN clause, if there is a failure fetching one of the
partitions, the entire request will fail and will need to be retried.  If
you use concurrent async queries, you'll only need to retry one small
request.

On Mon, Oct 6, 2014 at 1:14 PM, DuyHai Doan doanduy...@gmail.com wrote:

 Definitely better to not make the coordinator hold on to that memory
 while it waits for other requests to come back -- You get it. When
 loading big documents, you risk starving the heap quickly, triggering long
 GC cycle on the coordinator etc...

 On Mon, Oct 6, 2014 at 6:22 PM, Robert Wille rwi...@fold3.com wrote:

  As far as latency is concerned, it seems like it wouldn't matter very
 much if the coordinator has to wait for all the responses to come back, or
 the client waits for all the responses to come back. I’ve got the same
 latency either way.

  I would assume that 50 coordinations is more expensive than one
 coordination that does 50 times the work, but that’s probably insignificant
 when compared to the actual fetching of the data from the SSTables.

  I do see the point about putting stress on coordinator memory. In
 general, the documents will be very small, but there will occasionally be
 some rather large ones, potentially several megabytes in size. Definitely
 better to not make the coordinator hold on to that memory while it waits
 for other requests to come back.

  Robert

  On Oct 4, 2014, at 8:34 AM, DuyHai Doan doanduy...@gmail.com wrote:

  Definitely 50 concurrent queries, possibly in async mode.

  If you're using the IN clause with 50 values, the coordinator will
 block, waiting for 50 partitions to be fetched from different nodes (worst
 case = 50 nodes) before responding to client. In addition to the very  high
 latency, you'll put the stress on the coordinator memory.



 On Sat, Oct 4, 2014 at 3:09 PM, Robert Wille rwi...@fold3.com wrote:

 I have a table of small documents (less than 1K) that are often accessed
 together as a group. The group size is always less than 50. Which produces
 less load on the server, one query using an IN clause to get all 50 back
 together, or 50 concurrent queries? Which one is fastest?

 Thanks

 Robert







-- 
Tyler Hobbs
DataStax http://datastax.com/


Re: IN versus multiple asynchronous queries

2014-10-06 Thread Robert Wille
As far as latency is concerned, it seems like it wouldn't matter very much if 
the coordinator has to wait for all the responses to come back, or the client 
waits for all the responses to come back. I’ve got the same latency either way.

I would assume that 50 coordinations is more expensive than one coordination 
that does 50 times the work, but that’s probably insignificant when compared to 
the actual fetching of the data from the SSTables.

I do see the point about putting stress on coordinator memory. In general, the 
documents will be very small, but there will occasionally be some rather large 
ones, potentially several megabytes in size. Definitely better to not make the 
coordinator hold on to that memory while it waits for other requests to come 
back.

Robert

On Oct 4, 2014, at 8:34 AM, DuyHai Doan 
doanduy...@gmail.commailto:doanduy...@gmail.com wrote:

Definitely 50 concurrent queries, possibly in async mode.

If you're using the IN clause with 50 values, the coordinator will block, 
waiting for 50 partitions to be fetched from different nodes (worst case = 50 
nodes) before responding to client. In addition to the very  high latency, 
you'll put the stress on the coordinator memory.



On Sat, Oct 4, 2014 at 3:09 PM, Robert Wille 
rwi...@fold3.commailto:rwi...@fold3.com wrote:
I have a table of small documents (less than 1K) that are often accessed 
together as a group. The group size is always less than 50. Which produces less 
load on the server, one query using an IN clause to get all 50 back together, 
or 50 concurrent queries? Which one is fastest?

Thanks

Robert





Re: IN versus multiple asynchronous queries

2014-10-06 Thread DuyHai Doan
Definitely better to not make the coordinator hold on to that memory while
it waits for other requests to come back -- You get it. When loading big
documents, you risk starving the heap quickly, triggering long GC cycle on
the coordinator etc...

On Mon, Oct 6, 2014 at 6:22 PM, Robert Wille rwi...@fold3.com wrote:

  As far as latency is concerned, it seems like it wouldn't matter very
 much if the coordinator has to wait for all the responses to come back, or
 the client waits for all the responses to come back. I’ve got the same
 latency either way.

  I would assume that 50 coordinations is more expensive than one
 coordination that does 50 times the work, but that’s probably insignificant
 when compared to the actual fetching of the data from the SSTables.

  I do see the point about putting stress on coordinator memory. In
 general, the documents will be very small, but there will occasionally be
 some rather large ones, potentially several megabytes in size. Definitely
 better to not make the coordinator hold on to that memory while it waits
 for other requests to come back.

  Robert

  On Oct 4, 2014, at 8:34 AM, DuyHai Doan doanduy...@gmail.com wrote:

  Definitely 50 concurrent queries, possibly in async mode.

  If you're using the IN clause with 50 values, the coordinator will block,
 waiting for 50 partitions to be fetched from different nodes (worst case =
 50 nodes) before responding to client. In addition to the very  high
 latency, you'll put the stress on the coordinator memory.



 On Sat, Oct 4, 2014 at 3:09 PM, Robert Wille rwi...@fold3.com wrote:

 I have a table of small documents (less than 1K) that are often accessed
 together as a group. The group size is always less than 50. Which produces
 less load on the server, one query using an IN clause to get all 50 back
 together, or 50 concurrent queries? Which one is fastest?

 Thanks

 Robert






IN versus multiple asynchronous queries

2014-10-04 Thread Robert Wille
I have a table of small documents (less than 1K) that are often accessed 
together as a group. The group size is always less than 50. Which produces less 
load on the server, one query using an IN clause to get all 50 back together, 
or 50 concurrent queries? Which one is fastest?

Thanks

Robert



Re: IN versus multiple asynchronous queries

2014-10-04 Thread DuyHai Doan
Definitely 50 concurrent queries, possibly in async mode.

If you're using the IN clause with 50 values, the coordinator will block,
waiting for 50 partitions to be fetched from different nodes (worst case =
50 nodes) before responding to client. In addition to the very  high
latency, you'll put the stress on the coordinator memory.



On Sat, Oct 4, 2014 at 3:09 PM, Robert Wille rwi...@fold3.com wrote:

 I have a table of small documents (less than 1K) that are often accessed
 together as a group. The group size is always less than 50. Which produces
 less load on the server, one query using an IN clause to get all 50 back
 together, or 50 concurrent queries? Which one is fastest?

 Thanks

 Robert