Re: Initialising / maintaining a list of nodes in the cluster
A cassandra cluster must always use the same rpc port default 9160 On Friday, September 6, 2013, Paul LeoNerd leon...@leonerd.org.uk wrote: I'm trying to work out how a client is best to maintain a list of what nodes are available in the cluster, for maintaining connections to. I understand the general idea is to query the system.peers table, and REGISTER an interest in TOPOLOGY_CHANGE and STATUS_CHANGE messages to watch for nodes being added/removed or becoming unavailable/available. So far so good. A few details of this seem a bit awkward though: * The system.peers table identifies peers only by their IP address, not including the port number, whereas TOPOLOGY and STATUS_CHANGE messages include a port. What happens if there is more than one copy of a node using the same IP address? How do I know which TCP port I can use to communicate CQL with a given peer? * The system.peers table doesn't contain any information giving the current availability status of the nodes, so I don't know if they are initially up or down. I can just presume all the known nodes are up until I try connecting to them - in any case, it could be that Cassandra knows of the existence of nodes that for some reason my client can't connect to, so I'd have to handle this case anyway. But it feels like that hint should be there somewhere. * The system.peers table doesn't include the actual node I am querying it on. Most of the missing information does appear in the system.local table, but not the address. The client does know /an/ address it has connected to that node using, but how can I be sure that this address is the one that will appear in the peers list on other nodes? It's quite common for a server to have multiple addresses, so it may be that I've connected to some address different to that which the other nodes know it by. I'm quite new to Cassandra, so there stands a chance I've overlooked something somewhere. Can anyone offer any comment or advice on these issues, or perhaps point me in the direction of some client code that manages to overcome them? Thanks, -- Paul LeoNerd Evans leon...@leonerd.org.uk ICQ# 4135350 | Registered Linux# 179460 http://www.leonerd.org.uk/
Re: Calling all library maintainers
On Fri, Nov 5, 2010 at 9:44 AM, Eric Evans eev...@rackspace.com wrote: On Fri, 2010-11-05 at 02:43 -0500, Stu Hood wrote: Java you serialize a type to a byte[] whereas with the query language you'd serialize to a string term The serializing to a byte[] part is what the RPC libraries exist for. With a string serialization format, you are setting all of your clients up to become string concatenation engines with an ad-hoc format defined by your spec: essentially, duplicating Avro and Thrift. I was referring to keys and column names and values which are typed as binary in both Avro and Thrift. TIMEUUID(timestamp) Note that this same approach is possible in Avro by adding a union type: it is not dependent on String serialization. How can a TimeUUIDType be expressed in Avro using a union? to serialize that to a string like 10L, than it would be to pack a binary string in network-order I don't think you are giving client library devs enough credit: this only needs to be implemented once, and I'm sure they're capable. I was speaking to the relative difficulty in serializing a type using one method or another. In other words, in Python it becomes: import struct; struct.pack('d', val) versus str(val) Both of which only need to be implemented once, of course. -Original Message- From: Eric Evans eev...@rackspace.com Sent: Thursday, November 4, 2010 2:59pm To: client-dev@cassandra.apache.org Subject: Re: Calling all library maintainers On Thu, 2010-11-04 at 21:28 +0200, Ran Tavory wrote: A QL can shield clients from a class of changes, but OTOH will make clients have to compose the query strings, where with type safe libraries this job is somewhat easier. IMO in the near term introducing a query language will make client dev somewhat harder b/c of the (somewhat negligible) work of composing query strings and mostly b/c I don't expect the QL to be stable at v1 so still a moving target, but easier in the the long term mainly due to the hope that the QL will stabilize. I think you could argue that it makes all of this easier. Right now from Java you serialize a type to a byte[] whereas with the query language you'd serialize to a string term. That's a bit more effort out of the gate for primitives like long for example, but consider the venerable TimeUUID that causes so much frustration. I think it would be much easier to take a timestamp and construct a term like TIMEUUID(timestamp) (or whatever), especially since that would work identically across all clients. And it's also worth pointing out that not all languages in use are statically typed, so even in the case of an int, or a long, it'd be easier (or as easy at least), to serialize that to a string like 10L, than it would be to pack a binary string in network-order. As for not being stable, well, yeah it's going to need to bake a bit before being suitable for widespread use, but I raise it here not to encourage everyone to transition now, but so that you can help shape the outcome (if you're interested, of course). One other benefit of query languages is that they make tooling a little easier, one does not have to come up with a specific CLI interpreter or a web interface with a set of input fields, you just have to type your QL into a text box or a terminal like you do with sql. Long term I think I'm in for a QL (although I have to think about the syntax you suggested) but I don't expect it to benefit client devs in the near term even if it was ready today as an alternative to thrift. One small question, does this language tunnel through avro or thrift calls? (Is conn.execute() an avro or thrift call) It's avro for the simple reason that that's still sort of an experimental code path and seemed a less controverial sandbox. When the spec and implementation are complete, and if it gains suitable traction, I'd actually like to explore a customized transport and serialization. -- Eric Evans eev...@rackspace.com I still think the query language is a good idea but I have one negative point about it. One of the selling point about a simple data model and access language was that there were never issues where a query planner refused to do the query the optimal way the user desired. For example a query using order and limit would first order the dataset and then limit when the user wanted to limit then order. Also without sounding syndical, I see SQL-ify catering to the lower half. Take projecting columns from a row for example. SQL-ish is going to encourage people to NOT learn about SlicePredicate and attempt get by using the SQL interface. They will not understand how to take advantage of the data model and what it provides. With 7.0 where schema changes can happen on the fly, users are going to have more freedom to create ColumnFamilies. Aided by their QL interface and their pre-disposition to think SQL they are going to
Re: Calling all library maintainers
On Fri, Nov 5, 2010 at 11:40 AM, Gary Dusbabek gdusba...@gmail.com wrote: I still think the query language is a good idea but I have one negative point about it. One of the selling point about a simple data model and access language was that there were never issues where a query planner refused to do the query the optimal way the user desired. For example a query using order and limit would first order the dataset and then limit when the user wanted to limit then order. This would be a limitation of the expressiveness of the grammar or a failure on the part of the user and not really a problem with the query planner. The QL statements so far are simple enough that there is basically One Way to perform the operation once they are on the server. I don't see ambiguity seeping in, but I suppose this is possible. Also without sounding syndical, I see SQL-ify catering to the lower half. Take projecting columns from a row for example. SQL-ish is going to encourage people to NOT learn about SlicePredicate and attempt get by using the SQL interface. They will not understand how to take advantage of the data model and what it provides. With 7.0 where schema changes can happen on the fly, users are going to have more freedom to create ColumnFamilies. Aided by their QL interface and their pre-disposition to think SQL they are going to structure column families like SQL tables. They could end up with unoptimized tables and planner making the non optimal queries. I somewhat feel a QL language would be like Cassandra training wheels. Valid concern, but a different debate. I think we've all seen the effects of the I did crap wrong, therefore Cassanda sucks blog posts. I hope that Cassandra's signal can make it to the point where it is established enough to where those posts don't contribute as much noise as they currently do. Looking out a year or so into the future, I think we'll be there. At that point, features like a QL will help to make Cassandra more approachable. But whether or not Cassandra should be more approachable, I think, is a different debate. Gary. I believe it is the same debate. If a QL based API is made, will it really be able to replace the current API? Is the true target audience everyone. What do you think of the concept of not re-using sql keywords? Having a data access language a good idea but trying to make it look like SQL is counter intuitive to the data model and will leave people wondering what functions from SQL they have and do not have. Also having a subset SQL language can open you up to criticism. Mysql not supporting sub selects in older versions etc. With hive I do get some sneers about lack of IN support. What about a data access language that does not share and SQL keywords, custom made for NoSQL?