Yes, I am using the CQL datastax drivers. It was a good advice, thanks a lot Janathan. []s
2014-06-20 0:28 GMT-03:00 Jonathan Haddad <j...@jonhaddad.com>: > The only case in which it might be better to use an IN clause is if > the entire query can be satisfied from that machine. Otherwise, go > async. > > The native driver reuses connections and intelligently manages the > pool for you. It can also multiplex queries over a single connection. > > I am assuming you're using one of the datastax drivers for CQL, btw. > > Jon > > On Thu, Jun 19, 2014 at 7:37 PM, Marcelo Elias Del Valle > <marc...@s1mbi0se.com.br> wrote: > > This is interesting, I didn't know that! > > It might make sense then to use select = + async + token aware, I will > try > > to change my code. > > > > But would it be a "recomended solution" for these cases? Any other > options? > > > > I still would if this is the right use case for Cassandra, to look for > > random keys in a huge cluster. After all, the amount of connections to > > Cassandra will still be huge, right... Wouldn't it be a problem? > > Or when you use async the driver reuses the connection? > > > > []s > > > > > > 2014-06-19 22:16 GMT-03:00 Jonathan Haddad <j...@jonhaddad.com>: > > > >> If you use async and your driver is token aware, it will go to the > >> proper node, rather than requiring the coordinator to do so. > >> > >> Realistically you're going to have a connection open to every server > >> anyways. It's the difference between you querying for the data > >> directly and using a coordinator as a proxy. It's faster to just ask > >> the node with the data. > >> > >> On Thu, Jun 19, 2014 at 6:11 PM, Marcelo Elias Del Valle > >> <marc...@s1mbi0se.com.br> wrote: > >> > But using async queries wouldn't be even worse than using SELECT IN? > >> > The justification in the docs is I could query many nodes, but I would > >> > still > >> > do it. > >> > > >> > Today, I use both async queries AND SELECT IN: > >> > > >> > SELECT_ENTITY_LOOKUP = "SELECT entity_id FROM " + ENTITY_LOOKUP + " > >> > WHERE > >> > name=%s and value in(%s)" > >> > > >> > for name, values in identifiers.items(): > >> > query = self.SELECT_ENTITY_LOOKUP % ('%s', > >> > ','.join(['%s']*len(values))) > >> > args = [name] + values > >> > query_msg = query % tuple(args) > >> > futures.append((query_msg, self.session.execute_async(query, > args))) > >> > > >> > for query_msg, future in futures: > >> > try: > >> > rows = future.result(timeout=100000) > >> > for row in rows: > >> > entity_ids.add(row.entity_id) > >> > except: > >> > logging.error("Query '%s' returned ERROR " % (query_msg)) > >> > raise > >> > > >> > Using async just with select = would mean instead of 1 async query > >> > (example: > >> > in (0, 1, 2)), I would do several, one for each value of "values" > array > >> > above. > >> > In my head, this would mean more connections to Cassandra and the same > >> > amount of work, right? What would be the advantage? > >> > > >> > []s > >> > > >> > > >> > > >> > > >> > 2014-06-19 22:01 GMT-03:00 Jonathan Haddad <j...@jonhaddad.com>: > >> > > >> >> Your other option is to fire off async queries. It's pretty > >> >> straightforward w/ the java or python drivers. > >> >> > >> >> On Thu, Jun 19, 2014 at 5:56 PM, Marcelo Elias Del Valle > >> >> <marc...@s1mbi0se.com.br> wrote: > >> >> > I was taking a look at Cassandra anti-patterns list: > >> >> > > >> >> > > >> >> > > >> >> > > http://www.datastax.com/documentation/cassandra/2.0/cassandra/architecture/architecturePlanningAntiPatterns_c.html > >> >> > > >> >> > Among then is > >> >> > > >> >> > SELECT ... IN or index lookups¶ > >> >> > > >> >> > SELECT ... IN and index lookups (formerly secondary indexes) should > >> >> > be > >> >> > avoided except for specific scenarios. See When not to use IN in > >> >> > SELECT > >> >> > and > >> >> > When not to use an index in Indexing in > >> >> > > >> >> > CQL for Cassandra 2.0" > >> >> > > >> >> > And Looking at the SELECT doc, I saw: > >> >> > > >> >> > When not to use IN¶ > >> >> > > >> >> > The recommendations about when not to use an index apply to using > IN > >> >> > in > >> >> > the > >> >> > WHERE clause. Under most conditions, using IN in the WHERE clause > is > >> >> > not > >> >> > recommended. Using IN can degrade performance because usually many > >> >> > nodes > >> >> > must be queried. For example, in a single, local data center > cluster > >> >> > having > >> >> > 30 nodes, a replication factor of 3, and a consistency level of > >> >> > LOCAL_QUORUM, a single key query goes out to two nodes, but if the > >> >> > query > >> >> > uses the IN condition, the number of nodes being queried are most > >> >> > likely > >> >> > even higher, up to 20 nodes depending on where the keys fall in the > >> >> > token > >> >> > range." > >> >> > > >> >> > In my system, I have a column family called "entity_lookup": > >> >> > > >> >> > CREATE KEYSPACE IF NOT EXISTS Identification1 > >> >> > WITH REPLICATION = { 'class' : 'NetworkTopologyStrategy', > >> >> > 'DC1' : 3 }; > >> >> > USE Identification1; > >> >> > > >> >> > CREATE TABLE IF NOT EXISTS entity_lookup ( > >> >> > name varchar, > >> >> > value varchar, > >> >> > entity_id uuid, > >> >> > PRIMARY KEY ((name, value), entity_id)); > >> >> > > >> >> > And I use the following select to query it: > >> >> > > >> >> > SELECT entity_id FROM entity_lookup WHERE name=%s and value in(%s) > >> >> > > >> >> > Is this an anti-pattern? > >> >> > > >> >> > If not using SELECT IN, which other way would you recomend for > >> >> > lookups > >> >> > like > >> >> > that? I have several values I would like to search in cassandra and > >> >> > they > >> >> > might not be in the same particion, as above. > >> >> > > >> >> > Is Cassandra the wrong tool for lookups like that? > >> >> > > >> >> > Best regards, > >> >> > Marcelo Valle. > >> >> > > >> >> > > >> >> > > >> >> > > >> >> > > >> >> > > >> >> > > >> >> > > >> >> > > >> >> > > >> >> > > >> >> > >> >> > >> >> > >> >> -- > >> >> Jon Haddad > >> >> http://www.rustyrazorblade.com > >> >> skype: rustyrazorblade > >> > > >> > > >> > >> > >> > >> -- > >> Jon Haddad > >> http://www.rustyrazorblade.com > >> skype: rustyrazorblade > > > > > > > > -- > Jon Haddad > http://www.rustyrazorblade.com > skype: rustyrazorblade >