Re: Batch read requests to one physical host?

2016-10-19 Thread Nate McCall
I see a few slightly different things here (equally valuable) in
conjunction with CASSANDRA-10414:
- Wanting a small number of specific, non-sequential rows out of the
same partition (this is common, IME) and grouping those
- Extending batch semantics to reads with the same understanding with
mutate that if you put different partitions in the same batch it will
be slow

(I think Eric's IN(..) sorta fits with either of those).

Interesting!

On Thu, Oct 20, 2016 at 4:26 AM, Tyler Hobbs  wrote:
> There's a similar ticket focusing on range reads and secondary index
> queries, but the work for these could be done together:
> https://issues.apache.org/jira/browse/CASSANDRA-10414
>
> On Tue, Oct 18, 2016 at 5:59 PM, Dikang Gu  wrote:
>
>> Hi there,
>>
>> We have couple use cases that are doing fanout read for their data, means
>> one single read request from client contains multiple keys which live on
>> different physical hosts. (I know it's not recommended way to access C*).
>>
>> Right now, on the coordinator, it will issue separate read commands even
>> though they will go to the same physical host, which I think is causing a
>> lot of overheads.
>>
>> I'm wondering is it valuable to provide a new read command, that
>> coordinator can batch the reads to one datanode, and send to it in one
>> message, and datanode will return the results for all keys belong to it?
>>
>> Any similar ideas before?
>>
>>
>> --
>> Dikang
>>
>
>
>
> --
> Tyler Hobbs
> DataStax 


Re: Batch read requests to one physical host?

2016-10-19 Thread Tyler Hobbs
There's a similar ticket focusing on range reads and secondary index
queries, but the work for these could be done together:
https://issues.apache.org/jira/browse/CASSANDRA-10414

On Tue, Oct 18, 2016 at 5:59 PM, Dikang Gu  wrote:

> Hi there,
>
> We have couple use cases that are doing fanout read for their data, means
> one single read request from client contains multiple keys which live on
> different physical hosts. (I know it's not recommended way to access C*).
>
> Right now, on the coordinator, it will issue separate read commands even
> though they will go to the same physical host, which I think is causing a
> lot of overheads.
>
> I'm wondering is it valuable to provide a new read command, that
> coordinator can batch the reads to one datanode, and send to it in one
> message, and datanode will return the results for all keys belong to it?
>
> Any similar ideas before?
>
>
> --
> Dikang
>



-- 
Tyler Hobbs
DataStax 


Re: Batch read requests to one physical host?

2016-10-18 Thread Eric Stevens
We've had some luck with bulk known key reads with grouping by replica and
doing SELECT... WHERE key IN(...). Not compatible with all data models, but
it works well where we can get away with it.

As a more general purpose construct it makes sense to me. In our driver
layer we have abstracted batches to support read batches (under which the
above method is applied) even though Cassandra doesn't support it first
class.

On Tue, Oct 18, 2016, 5:00 PM Dikang Gu  wrote:

> Hi there,
>
> We have couple use cases that are doing fanout read for their data, means
> one single read request from client contains multiple keys which live on
> different physical hosts. (I know it's not recommended way to access C*).
>
> Right now, on the coordinator, it will issue separate read commands even
> though they will go to the same physical host, which I think is causing a
> lot of overheads.
>
> I'm wondering is it valuable to provide a new read command, that
> coordinator can batch the reads to one datanode, and send to it in one
> message, and datanode will return the results for all keys belong to it?
>
> Any similar ideas before?
>
>
> --
> Dikang
>


Batch read requests to one physical host?

2016-10-18 Thread Dikang Gu
Hi there,

We have couple use cases that are doing fanout read for their data, means
one single read request from client contains multiple keys which live on
different physical hosts. (I know it's not recommended way to access C*).

Right now, on the coordinator, it will issue separate read commands even
though they will go to the same physical host, which I think is causing a
lot of overheads.

I'm wondering is it valuable to provide a new read command, that
coordinator can batch the reads to one datanode, and send to it in one
message, and datanode will return the results for all keys belong to it?

Any similar ideas before?


-- 
Dikang