Re: Best way to do a multi_get using CQL

2014-06-20 Thread Marcelo Elias Del Valle
Yes, I am using the CQL datastax drivers. It was a good advice, thanks a lot Janathan. []s 2014-06-20 0:28 GMT-03:00 Jonathan Haddad j...@jonhaddad.com: The only case in which it might be better to use an IN clause is if the entire query can be satisfied from that machine. Otherwise, go

Re: Best way to do a multi_get using CQL

2014-06-20 Thread Laing, Michael
However my extensive benchmarking this week of the python driver from master shows a performance *decrease* when using 'token_aware'. This is on 12-node, 2-datacenter, RF-3 cluster in AWS. Also why do the work the coordinator will do for you: send all the queries, wait for everything to come

Re: Best way to do a multi_get using CQL

2014-06-20 Thread Jeremy Jongsma
I've found that if you have any amount of latency between your client and nodes, and you are executing a large batch of queries, you'll usually want to send them together to one node unless execution time is of no concern. The tradeoff is resource usage on the connected node vs. time to complete

Re: Best way to do a multi_get using CQL

2014-06-20 Thread Marcelo Elias Del Valle
A question, not sure if you guys know the answer: Supose I async query 1000 rows using token aware and suppose I have 10 nodes. Suppose also each node would receive 100 row queries each. How does async work in this case? Would it send each row query to each node in a different connection?

Re: Best way to do a multi_get using CQL

2014-06-20 Thread Jeremy Jongsma
That depends on the connection pooling implementation in your driver. Astyanax will keep N connections open to each node (configurable) and route each query in a separate message over an existing connection, waiting until one becomes available if all are in use. On Fri, Jun 20, 2014 at 12:32 PM,

Re: Best way to do a multi_get using CQL

2014-06-20 Thread Marcelo Elias Del Valle
I am using python + CQL Driver. I wonder how they do... These things seems little important, but they are fundamental to get a good performance in Cassandra... I wish there was a simpler way to query in batches. Opening a large amount of connections and sending 1 message at a time seems bad to me,

Re: Best way to do a multi_get using CQL

2014-06-20 Thread DuyHai Doan
Well it's kind of a trade-off. Either you send data directly to the primary replica nodes to take advantage of data-locality using token-aware strategy and the price to pay is a high number of opened connections from client side. Or you just batch data to a random node playing the coordinator

Re: Best way to do a multi_get using CQL

2014-06-20 Thread Jeremy Jongsma
There is nothing preventing that in Cassandra, it's just a matter of how intelligent the driver API is. Submit a feature request to Astyanax or Datastax driver projects. On Fri, Jun 20, 2014 at 2:27 PM, Marcelo Elias Del Valle marc...@s1mbi0se.com.br wrote: The bad design part (just my

Re: Best way to do a multi_get using CQL

2014-06-20 Thread Jonathan Haddad
I forgot to add that each connection can handle multiple simultaneous queries. This was part of the original protocol as of C* 1.2: http://www.datastax.com/dev/blog/binary-protocol Asynchronous: each connection can handle more than one active request at the same time. In practice, this means

Best way to do a multi_get using CQL

2014-06-19 Thread Marcelo Elias Del Valle
I was taking a look at Cassandra anti-patterns list: http://www.datastax.com/documentation/cassandra/2.0/cassandra/architecture/architecturePlanningAntiPatterns_c.html Among then is SELECT ... IN or index lookups¶

Re: Best way to do a multi_get using CQL

2014-06-19 Thread Jonathan Haddad
Your other option is to fire off async queries. It's pretty straightforward w/ the java or python drivers. On Thu, Jun 19, 2014 at 5:56 PM, Marcelo Elias Del Valle marc...@s1mbi0se.com.br wrote: I was taking a look at Cassandra anti-patterns list:

Re: Best way to do a multi_get using CQL

2014-06-19 Thread Marcelo Elias Del Valle
But using async queries wouldn't be even worse than using SELECT IN? The justification in the docs is I could query many nodes, but I would still do it. Today, I use both async queries AND SELECT IN: SELECT_ENTITY_LOOKUP = SELECT entity_id FROM + ENTITY_LOOKUP + WHERE name=%s and value in(%s)

Re: Best way to do a multi_get using CQL

2014-06-19 Thread Jonathan Haddad
If you use async and your driver is token aware, it will go to the proper node, rather than requiring the coordinator to do so. Realistically you're going to have a connection open to every server anyways. It's the difference between you querying for the data directly and using a coordinator as

Re: Best way to do a multi_get using CQL

2014-06-19 Thread Marcelo Elias Del Valle
This is interesting, I didn't know that! It might make sense then to use select = + async + token aware, I will try to change my code. But would it be a recomended solution for these cases? Any other options? I still would if this is the right use case for Cassandra, to look for random keys in a

Re: Best way to do a multi_get using CQL

2014-06-19 Thread Jonathan Haddad
The only case in which it might be better to use an IN clause is if the entire query can be satisfied from that machine. Otherwise, go async. The native driver reuses connections and intelligently manages the pool for you. It can also multiplex queries over a single connection. I am assuming