Re: Best way to do a multi_get using CQL

Marcelo Elias Del Valle Fri, 20 Jun 2014 00:30:07 -0700

Yes, I am using the CQL datastax drivers.
It was a good advice, thanks a lot Janathan.
[]s



2014-06-20 0:28 GMT-03:00 Jonathan Haddad <j...@jonhaddad.com>:

> The only case in which it might be better to use an IN clause is if
> the entire query can be satisfied from that machine.  Otherwise, go
> async.
>
> The native driver reuses connections and intelligently manages the
> pool for you.  It can also multiplex queries over a single connection.
>
> I am assuming you're using one of the datastax drivers for CQL, btw.
>
> Jon
>
> On Thu, Jun 19, 2014 at 7:37 PM, Marcelo Elias Del Valle
> <marc...@s1mbi0se.com.br> wrote:
> > This is interesting, I didn't know that!
> > It might make sense then to use select = + async + token aware, I will
> try
> > to change my code.
> >
> > But would it be a "recomended solution" for these cases? Any other
> options?
> >
> > I still would if this is the right use case for Cassandra, to look for
> > random keys in a huge cluster. After all, the amount of connections to
> > Cassandra will still be huge, right... Wouldn't it be a problem?
> > Or when you use async the driver reuses the connection?
> >
> > []s
> >
> >
> > 2014-06-19 22:16 GMT-03:00 Jonathan Haddad <j...@jonhaddad.com>:
> >
> >> If you use async and your driver is token aware, it will go to the
> >> proper node, rather than requiring the coordinator to do so.
> >>
> >> Realistically you're going to have a connection open to every server
> >> anyways.  It's the difference between you querying for the data
> >> directly and using a coordinator as a proxy.  It's faster to just ask
> >> the node with the data.
> >>
> >> On Thu, Jun 19, 2014 at 6:11 PM, Marcelo Elias Del Valle
> >> <marc...@s1mbi0se.com.br> wrote:
> >> > But using async queries wouldn't be even worse than using SELECT IN?
> >> > The justification in the docs is I could query many nodes, but I would
> >> > still
> >> > do it.
> >> >
> >> > Today, I use both async queries AND SELECT IN:
> >> >
> >> > SELECT_ENTITY_LOOKUP = "SELECT entity_id FROM " + ENTITY_LOOKUP + "
> >> > WHERE
> >> > name=%s and value in(%s)"
> >> >
> >> > for name, values in identifiers.items():
> >> >    query = self.SELECT_ENTITY_LOOKUP % ('%s',
> >> > ','.join(['%s']*len(values)))
> >> >    args = [name] + values
> >> >    query_msg = query % tuple(args)
> >> >    futures.append((query_msg, self.session.execute_async(query,
> args)))
> >> >
> >> > for query_msg, future in futures:
> >> >    try:
> >> >       rows = future.result(timeout=100000)
> >> >       for row in rows:
> >> >         entity_ids.add(row.entity_id)
> >> >    except:
> >> >       logging.error("Query '%s' returned ERROR " % (query_msg))
> >> >       raise
> >> >
> >> > Using async just with select = would mean instead of 1 async query
> >> > (example:
> >> > in (0, 1, 2)), I would do several, one for each value of "values"
> array
> >> > above.
> >> > In my head, this would mean more connections to Cassandra and the same
> >> > amount of work, right? What would be the advantage?
> >> >
> >> > []s
> >> >
> >> >
> >> >
> >> >
> >> > 2014-06-19 22:01 GMT-03:00 Jonathan Haddad <j...@jonhaddad.com>:
> >> >
> >> >> Your other option is to fire off async queries.  It's pretty
> >> >> straightforward w/ the java or python drivers.
> >> >>
> >> >> On Thu, Jun 19, 2014 at 5:56 PM, Marcelo Elias Del Valle
> >> >> <marc...@s1mbi0se.com.br> wrote:
> >> >> > I was taking a look at Cassandra anti-patterns list:
> >> >> >
> >> >> >
> >> >> >
> >> >> >
> http://www.datastax.com/documentation/cassandra/2.0/cassandra/architecture/architecturePlanningAntiPatterns_c.html
> >> >> >
> >> >> > Among then is
> >> >> >
> >> >> > SELECT ... IN or index lookups¶
> >> >> >
> >> >> > SELECT ... IN and index lookups (formerly secondary indexes) should
> >> >> > be
> >> >> > avoided except for specific scenarios. See When not to use IN in
> >> >> > SELECT
> >> >> > and
> >> >> > When not to use an index in Indexing in
> >> >> >
> >> >> > CQL for Cassandra 2.0"
> >> >> >
> >> >> > And Looking at the SELECT doc, I saw:
> >> >> >
> >> >> > When not to use IN¶
> >> >> >
> >> >> > The recommendations about when not to use an index apply to using
> IN
> >> >> > in
> >> >> > the
> >> >> > WHERE clause. Under most conditions, using IN in the WHERE clause
> is
> >> >> > not
> >> >> > recommended. Using IN can degrade performance because usually many
> >> >> > nodes
> >> >> > must be queried. For example, in a single, local data center
> cluster
> >> >> > having
> >> >> > 30 nodes, a replication factor of 3, and a consistency level of
> >> >> > LOCAL_QUORUM, a single key query goes out to two nodes, but if the
> >> >> > query
> >> >> > uses the IN condition, the number of nodes being queried are most
> >> >> > likely
> >> >> > even higher, up to 20 nodes depending on where the keys fall in the
> >> >> > token
> >> >> > range."
> >> >> >
> >> >> > In my system, I have a column family called "entity_lookup":
> >> >> >
> >> >> > CREATE KEYSPACE IF NOT EXISTS Identification1
> >> >> >   WITH REPLICATION = { 'class' : 'NetworkTopologyStrategy',
> >> >> >   'DC1' : 3 };
> >> >> > USE Identification1;
> >> >> >
> >> >> > CREATE TABLE IF NOT EXISTS entity_lookup (
> >> >> >   name varchar,
> >> >> >   value varchar,
> >> >> >   entity_id uuid,
> >> >> >   PRIMARY KEY ((name, value), entity_id));
> >> >> >
> >> >> > And I use the following select to query it:
> >> >> >
> >> >> > SELECT entity_id FROM entity_lookup WHERE name=%s and value in(%s)
> >> >> >
> >> >> > Is this an anti-pattern?
> >> >> >
> >> >> > If not using SELECT IN, which other way would you recomend for
> >> >> > lookups
> >> >> > like
> >> >> > that? I have several values I would like to search in cassandra and
> >> >> > they
> >> >> > might not be in the same particion, as above.
> >> >> >
> >> >> > Is Cassandra the wrong tool for lookups like that?
> >> >> >
> >> >> > Best regards,
> >> >> > Marcelo Valle.
> >> >> >
> >> >> >
> >> >> >
> >> >> >
> >> >> >
> >> >> >
> >> >> >
> >> >> >
> >> >> >
> >> >> >
> >> >> >
> >> >>
> >> >>
> >> >>
> >> >> --
> >> >> Jon Haddad
> >> >> http://www.rustyrazorblade.com
> >> >> skype: rustyrazorblade
> >> >
> >> >
> >>
> >>
> >>
> >> --
> >> Jon Haddad
> >> http://www.rustyrazorblade.com
> >> skype: rustyrazorblade
> >
> >
>
>
>
> --
> Jon Haddad
> http://www.rustyrazorblade.com
> skype: rustyrazorblade
>

Re: Best way to do a multi_get using CQL

Reply via email to