Re: Initialising / maintaining a list of nodes in the cluster

2013-09-07 Thread Edward Capriolo
A cassandra cluster must always use the same rpc port default 9160

On Friday, September 6, 2013, Paul LeoNerd leon...@leonerd.org.uk wrote:
 I'm trying to work out how a client is best to maintain a list of what
 nodes are available in the cluster, for maintaining connections to.

 I understand the general idea is to query the system.peers table, and
 REGISTER an interest in TOPOLOGY_CHANGE and STATUS_CHANGE messages to
 watch for nodes being added/removed or becoming unavailable/available.
 So far so good.

 A few details of this seem a bit awkward though:

  * The system.peers table identifies peers only by their IP address,
not including the port number, whereas TOPOLOGY and STATUS_CHANGE
messages include a port.

What happens if there is more than one copy of a node using the same
IP address? How do I know which TCP port I can use to communicate
CQL with a given peer?

  * The system.peers table doesn't contain any information giving the
current availability status of the nodes, so I don't know if they
are initially up or down.

I can just presume all the known nodes are up until I try connecting
to them - in any case, it could be that Cassandra knows of the
existence of nodes that for some reason my client can't connect to,
so I'd have to handle this case anyway. But it feels like that hint
should be there somewhere.

  * The system.peers table doesn't include the actual node I am
querying it on.

Most of the missing information does appear in the system.local
table, but not the address. The client does know /an/ address it has
connected to that node using, but how can I be sure that this
address is the one that will appear in the peers list on other
nodes? It's quite common for a server to have multiple addresses, so
it may be that I've connected to some address different to that
which the other nodes know it by.

 I'm quite new to Cassandra, so there stands a chance I've overlooked
 something somewhere. Can anyone offer any comment or advice on these
 issues, or perhaps point me in the direction of some client code that
 manages to overcome them?

 Thanks,

 --
 Paul LeoNerd Evans

 leon...@leonerd.org.uk
 ICQ# 4135350   |  Registered Linux# 179460
 http://www.leonerd.org.uk/



Re: Calling all library maintainers

2010-11-05 Thread Edward Capriolo
On Fri, Nov 5, 2010 at 9:44 AM, Eric Evans eev...@rackspace.com wrote:
 On Fri, 2010-11-05 at 02:43 -0500, Stu Hood wrote:
  Java you serialize a type to a byte[] whereas with the query
  language you'd serialize to a string term
 
 The serializing to a byte[] part is what the RPC libraries exist for.
 With a string serialization format, you are setting all of your clients
 up to become string concatenation engines with an ad-hoc format defined
 by your spec: essentially, duplicating Avro and Thrift.

 I was referring to keys and column names and values which are typed as
 binary in both Avro and Thrift.

  TIMEUUID(timestamp)
 Note that this same approach is possible in Avro by adding a union type:
 it is not dependent on String serialization.

 How can a TimeUUIDType be expressed in Avro using a union?

  to serialize that to a string like
  10L, than it would be to pack a binary string in network-order
 I don't think you are giving client library devs enough credit: this only 
 needs
 to be implemented once, and I'm sure they're capable.

 I was speaking to the relative difficulty in serializing a type using
 one method or another.  In other words, in Python it becomes:

 import struct; struct.pack('d', val)

 versus

 str(val)

 Both of which only need to be implemented once, of course.

 -Original Message-
 From: Eric Evans eev...@rackspace.com
 Sent: Thursday, November 4, 2010 2:59pm
 To: client-dev@cassandra.apache.org
 Subject: Re: Calling all library maintainers

 On Thu, 2010-11-04 at 21:28 +0200, Ran Tavory wrote:
  A QL can shield clients from a class of changes, but OTOH will make
  clients have to compose the query strings, where with type safe
  libraries this job is somewhat easier. IMO in the near term
  introducing a query language will make client dev somewhat harder b/c
  of the (somewhat negligible) work of composing query strings and
  mostly b/c I don't expect the QL to be stable at v1 so still a moving
  target, but easier in the the long term mainly due to the hope that
  the QL will stabilize.

 I think you could argue that it makes all of this easier.  Right now
 from Java you serialize a type to a byte[] whereas with the query
 language you'd serialize to a string term.  That's a bit more effort out
 of the gate for primitives like long for example, but consider the
 venerable TimeUUID that causes so much frustration.  I think it would be
 much easier to take a timestamp and construct a term like
 TIMEUUID(timestamp) (or whatever), especially since that would work
 identically across all clients.

 And it's also worth pointing out that not all languages in use are
 statically typed, so even in the case of an int, or a long, it'd be
 easier (or as easy at least), to serialize that to a string like
 10L, than it would be to pack a binary string in network-order.

 As for not being stable, well, yeah it's going to need to bake a bit
 before being suitable for widespread use, but I raise it here not to
 encourage everyone to transition now, but so that you can help shape the
 outcome (if you're interested, of course).

  One other benefit of query languages is that they make tooling a
  little easier, one does not have to come up with a specific CLI
  interpreter or a web interface with a set of input fields, you just
  have to type your QL into a text box or a terminal like you do with
  sql.
  Long term I think I'm in for a QL (although I have to think about the
  syntax you suggested) but I don't expect it to benefit client devs in
  the near term even if it was ready today as an alternative to thrift.
 
  One small question, does this language tunnel through avro or thrift
  calls? (Is  conn.execute() an avro or thrift call)

 It's avro for the simple reason that that's still sort of an
 experimental code path and seemed a less controverial sandbox.  When the
 spec and implementation are complete, and if it gains suitable traction,
 I'd actually like to explore a customized transport and serialization.



 --
 Eric Evans
 eev...@rackspace.com



I still think the query language is a good idea but I have one
negative point about it.

One of the selling point about a simple data model and access language
was that there were never issues where a query planner refused to do
the query the optimal way the user desired. For example a query
using order and limit would first order the dataset and then limit
when the user wanted to limit then order.

Also without sounding syndical, I see SQL-ify catering to the lower
half. Take projecting columns from a row for example. SQL-ish is going
to encourage people to NOT learn about SlicePredicate and attempt get
by using the SQL interface. They will not understand how to take
advantage of the data model and what it provides. With 7.0 where
schema changes can happen on the fly, users are going to have more
freedom to create ColumnFamilies. Aided by their QL interface and
their pre-disposition to think SQL they are going to 

Re: Calling all library maintainers

2010-11-05 Thread Edward Capriolo
On Fri, Nov 5, 2010 at 11:40 AM, Gary Dusbabek gdusba...@gmail.com wrote:

 I still think the query language is a good idea but I have one
 negative point about it.

 One of the selling point about a simple data model and access language
 was that there were never issues where a query planner refused to do
 the query the optimal way the user desired. For example a query
 using order and limit would first order the dataset and then limit
 when the user wanted to limit then order.

 This would be a limitation of the expressiveness of the grammar or a
 failure on the part of the user and not really a problem with the
 query planner.  The QL statements so far are simple enough that there
 is basically One Way to perform the operation once they are on the
 server.  I don't see ambiguity seeping in, but I suppose this is
 possible.

 Also without sounding syndical, I see SQL-ify catering to the lower
 half. Take projecting columns from a row for example. SQL-ish is going
 to encourage people to NOT learn about SlicePredicate and attempt get
 by using the SQL interface. They will not understand how to take
 advantage of the data model and what it provides. With 7.0 where
 schema changes can happen on the fly, users are going to have more
 freedom to create ColumnFamilies. Aided by their QL interface and
 their pre-disposition to think SQL they are going to structure column
 families like SQL tables. They could end up with unoptimized tables
 and planner making the non optimal queries.

 I somewhat feel a QL language would be like Cassandra training wheels.

 Valid concern, but a different debate.  I think we've all seen the
 effects of the I did crap wrong, therefore Cassanda sucks blog
 posts.  I hope that Cassandra's signal can make it to the point where
 it is established enough to where those posts don't contribute as much
 noise as they currently do.

 Looking out a year or so into the future, I think we'll be there.  At
 that point, features like a QL will help to make Cassandra more
 approachable.  But whether or not Cassandra should be more
 approachable, I think, is a different debate.

 Gary.



I believe it is the same debate. If a QL based API is made, will it
really be able to replace the current API? Is the true target audience
everyone.

What do you think of the concept of not re-using sql keywords? Having
a data access language a good idea but trying to make it look like SQL
is counter intuitive to the data model and will leave people wondering
what functions from SQL they have and do not have.

Also having a subset SQL language can open you up to criticism. Mysql
not supporting sub selects in older versions etc. With hive I do get
some sneers about lack of IN support.

What about a data access language that does not share and SQL
keywords, custom made for NoSQL?