Re: UUIDs whose alphanumeric order is the same as their chronological order
Having a physical location encoded in the UUID *increases* the chance of a collision, because it means fewer random bits. There definitely will be more than one UUID created in the same clock unit on the same machine! The same bits that you use to encode your few servers can be used for over 100 trillion random numbers! As to ordering, if you wanted to use time-uuids, comparators that do give time-based ordering are trivial, and no slower than lexical sorting. No slower isn't a good reason to use it! I am willing to take a (reasonable) time *penalty* to use lexically ordered UUIDs that will work both in Cassandra and Oracle (and which are human-readable - always good for debugging)! I am also willing to take a reasonable penalty to avoid using weird third-party code for generating UUIDs in the first place. On Tue, Jun 22, 2010 at 10:05 PM, Tatu Saloranta tsalora...@gmail.comwrote: On Tue, Jun 22, 2010 at 9:12 AM, David Boxenhorn da...@lookin2.com wrote: A little bit of time fuzziness on the order of a few milliseconds is fine with me. This is user-generated data, so it only has to be time-ordered at the level that a user can perceive. Ok, so mostly ordered. :-) I have no worries about my solution working - I'm sure it will work. I just wonder if TimeUUIDType isn't superior for some reason that I don't know about. (TimeUUIDType seems so bad in so many ways that I wonder why anyone uses it. There must be some reason!) I think that rationally thinking random-number based UUID is the best, provided one has a good random number generator. But there is something intuitive about rather using location + time-based alternative, based on tiny chance of collision that any (pseudo) random number based system has. So it just seems intuitive safer to use time-uuids, I think -- it isn't, it just feels that way. :-) Secondary reason is probably the ordering, and desire to stay standards compliant. As to ordering, if you wanted to use time-uuids, comparators that do give time-based ordering are trivial, and no slower than lexical sorting. Java Uuid Generator (2.0) defaults to such comparator, as I agree that this makes more sense than whatever sorting you would otherwise get. It is unfortunate that clock chunks are ordered in weird way by uuid specification; there is no reason it couldn't have been made right way so that hex representation would sort nicely. -+ Tatu +-
Re: UUIDs whose alphanumeric order is the same as their chronological order
Secondary reason is probably the ordering, and desire to stay standards compliant. My UUIDs are standards-compliant. They are of type 4. The type is encoded in the format: --4xxx-8xxx- . On Wed, Jun 23, 2010 at 9:54 AM, David Boxenhorn da...@lookin2.com wrote: Having a physical location encoded in the UUID *increases* the chance of a collision, because it means fewer random bits. There definitely will be more than one UUID created in the same clock unit on the same machine! The same bits that you use to encode your few servers can be used for over 100 trillion random numbers! As to ordering, if you wanted to use time-uuids, comparators that do give time-based ordering are trivial, and no slower than lexical sorting. No slower isn't a good reason to use it! I am willing to take a (reasonable) time *penalty* to use lexically ordered UUIDs that will work both in Cassandra and Oracle (and which are human-readable - always good for debugging)! I am also willing to take a reasonable penalty to avoid using weird third-party code for generating UUIDs in the first place. On Tue, Jun 22, 2010 at 10:05 PM, Tatu Saloranta tsalora...@gmail.comwrote: On Tue, Jun 22, 2010 at 9:12 AM, David Boxenhorn da...@lookin2.com wrote: A little bit of time fuzziness on the order of a few milliseconds is fine with me. This is user-generated data, so it only has to be time-ordered at the level that a user can perceive. Ok, so mostly ordered. :-) I have no worries about my solution working - I'm sure it will work. I just wonder if TimeUUIDType isn't superior for some reason that I don't know about. (TimeUUIDType seems so bad in so many ways that I wonder why anyone uses it. There must be some reason!) I think that rationally thinking random-number based UUID is the best, provided one has a good random number generator. But there is something intuitive about rather using location + time-based alternative, based on tiny chance of collision that any (pseudo) random number based system has. So it just seems intuitive safer to use time-uuids, I think -- it isn't, it just feels that way. :-) Secondary reason is probably the ordering, and desire to stay standards compliant. As to ordering, if you wanted to use time-uuids, comparators that do give time-based ordering are trivial, and no slower than lexical sorting. Java Uuid Generator (2.0) defaults to such comparator, as I agree that this makes more sense than whatever sorting you would otherwise get. It is unfortunate that clock chunks are ordered in weird way by uuid specification; there is no reason it couldn't have been made right way so that hex representation would sort nicely. -+ Tatu +-
Re: UUIDs whose alphanumeric order is the same as their chronological order
Tatu, I did read your comments - and I appreciate them very much! I want someone to argue with me (using good arguments) since what I'm doing *does* seem weird to me - because no one else is doing it. What I mean by readable is that the sort order of my UUIDs are obvious to humans. What I mean by weird code is mostly that it doesn't come with enough authority that I would trust it as a black-box more than my own code. For example, what happens when I want to port it to different kinds of machines? But another thing weird about it is the complexity (and I think low speed) of the algorithms I need in my *own* code to use it. Just look at it http://wiki.apache.org/cassandra/FAQ#working_with_timeuuid_in_java ! On Wed, Jun 23, 2010 at 10:03 AM, Tatu Saloranta tsalora...@gmail.comwrote: On Tue, Jun 22, 2010 at 11:54 PM, David Boxenhorn da...@lookin2.com wrote: Having a physical location encoded in the UUID *increases* the chance of a collision, because it means fewer random bits. There definitely will be more than one UUID created in the same clock unit on the same machine! The same bits that you use to encode your few servers can be used for over 100 trillion random numbers! You did not read what I wrote... I did not say it does, just that people feel as if it does. As to ordering, if you wanted to use time-uuids, comparators that do give time-based ordering are trivial, and no slower than lexical sorting. No slower isn't a good reason to use it! I am willing to take a (reasonable) time *penalty* to use lexically ordered UUIDs that will work both in Cassandra and Oracle (and which are human-readable - always good for debugging)! Huh? These are plain old UUIDs, as readable (or not) as any. Comparator refers to java.util.Comparator (or Comparable for class itself). But fear not, I am not trying to change your mind, just pointing out that there is nothing magical about getting things to sort. Just that sorting by standard String representation is not the only collation order there is. I am also willing to take a reasonable penalty to avoid using weird third-party code for generating UUIDs in the first place. To each his own -- lots of people use weird code, and generally use little bit less derogatory and patronizing terms when referring such libraries. And it seems to me that you are perfectly happy writing your own unweird code to generate them instead. :-) -+ Tatu +-
Re: nodetool loadbalance : Strerams Continue on Non Acceptance of New Token
On Tue, Jun 22, 2010 at 20:16, Arya Goudarzi agouda...@gaiaonline.com wrote: Hi, Please confirm if this is an issue and should be reported or I am doing something wrong. I could not find anything relevant on JIRA: Playing with 0.7 nightly (today's build), I setup a 3 node cluster this way: - Added one node; - Loaded default schema with RF 1 from YAML using JMX; - Loaded 2M keys using py_stress; - Bootstrapped a second node; - Cleaned up the first node; - Bootstrapped a third node; - Cleaned up the second node; I got the following ring: Address Status Load Range Ring 154293670372423273273390365393543806425 10.50.26.132 Up 518.63 MB 69164917636305877859094619660693892452 |--| 10.50.26.134 Up 234.8 MB 111685517405103688771527967027648896391 | | 10.50.26.133 Up 235.26 MB 154293670372423273273390365393543806425 |--| Now I ran: nodetool --host 10.50.26.132 loadbalance It's been going for a while. I checked the streams nodetool --host 10.50.26.134 streams Mode: Normal Not sending any streams. Streaming from: /10.50.26.132 Keyspace1: /var/lib/cassandra/data/Keyspace1/Standard1-tmp-d-3-Data.db/[(0,22206096), (22206096,27271682)] Keyspace1: /var/lib/cassandra/data/Keyspace1/Standard1-tmp-d-4-Data.db/[(0,15180462), (15180462,18656982)] Keyspace1: /var/lib/cassandra/data/Keyspace1/Standard1-tmp-d-5-Data.db/[(0,353139829), (353139829,433883659)] Keyspace1: /var/lib/cassandra/data/Keyspace1/Standard1-tmp-d-6-Data.db/[(0,366336059), (366336059,450095320)] nodetool --host 10.50.26.132 streams Mode: Leaving: streaming data to other nodes Streaming to: /10.50.26.134 /var/lib/cassandra/data/Keyspace1/Standard1-d-48-Data.db/[(0,366336059), (366336059,450095320)] Not receiving any streams. These have been going for the past 2 hours. I see in the logs of the node with 134 IP address and I saw this: INFO [GOSSIP_STAGE:1] 2010-06-22 16:30:54,679 StorageService.java (line 603) Will not change my token ownership to /10.50.26.132 A node will give this message when it sees another node (usually for the first time) that is trying to claim the same token but whose startup time is much earlier (i.e., this isn't a token replacement). It would follow that you would see this during a rebalance. So, to my understanding from wikis loadbalance supposed to decommission and re-bootstrap again by sending its tokens to other nodes and then bootstrap again. It's been stuck in streaming for the past 2 hours and the size of ring has not changed. The log in the first node says it has started streaming for the past hours: INFO [STREAM-STAGE:1] 2010-06-22 16:35:56,255 StreamOut.java (line 72) Beginning transfer process to /10.50.26.134 for ranges (154293670372423273273390365393543806425,69164917636305877859094619660693892452] INFO [STREAM-STAGE:1] 2010-06-22 16:35:56,255 StreamOut.java (line 82) Flushing memtables for Keyspace1... INFO [STREAM-STAGE:1] 2010-06-22 16:35:56,266 StreamOut.java (line 128) Stream context metadata [/var/lib/cassandra/data/Keyspace1/Standard1-d-48-Data.db/[(0,366336059), (366336059,450095320)]] 1 sstables. INFO [STREAM-STAGE:1] 2010-06-22 16:35:56,267 StreamOut.java (line 135) Sending a stream initiate message to /10.50.26.134 ... INFO [STREAM-STAGE:1] 2010-06-22 16:35:56,267 StreamOut.java (line 140) Waiting for transfer to /10.50.26.134 to complete INFO [FLUSH-TIMER] 2010-06-22 17:36:53,370 ColumnFamilyStore.java (line 359) LocationInfo has reached its threshold; switching in a fresh Memtable at CommitLogContext(file='/var/lib/cassandra/commitlog/CommitLog-1277249454413.log', position=720) INFO [FLUSH-TIMER] 2010-06-22 17:36:53,370 ColumnFamilyStore.java (line 622) Enqueuing flush of Memtable(LocationInfo)@1637794189 INFO [FLUSH-WRITER-POOL:1] 2010-06-22 17:36:53,370 Memtable.java (line 149) Writing Memtable(LocationInfo)@1637794189 INFO [FLUSH-WRITER-POOL:1] 2010-06-22 17:36:53,528 Memtable.java (line 163) Completed flushing /var/lib/cassandra/data/system/LocationInfo-d-9-Data.db INFO [MEMTABLE-POST-FLUSHER:1] 2010-06-22 17:36:53,529 ColumnFamilyStore.java (line 374) Discarding 1000 Nothing more after this line. Am I doing something wrong? If the output you get from `nodetool streams` isn't changing, then I'd say we have a bug. You're data sizes weren't that large--I'd expect 2 hrs would be more than enough time. I've created https://issues.apache.org/jira/browse/CASSANDRA-1221 to track this problem. Gary. Best Regards, -Arya
Re: Uneven distribution using RP
On Tue, 2010-06-22 at 17:47 -0400, James Golick wrote: It's also flushing memtables really quickly for a particular CF. Like, really quickly. Like, one every minute. I increased the thresholds by 10x and it's still going fast. What is MemtableFlushAfterMinutes set to? -- Eric Evans eev...@rackspace.com
Timeout when cluster node fails/restarts
Hi, I've currently setup a cluster of 11 nodes. When running a small application that uses Hector to read and write keys, and restarting one of the nodes (not the one the application is connected to), the application stalls, times out and reconnects. This takes roughly 10 seconds. When the node is marked as dead, the application seems to continue again. The application itself is only connecting to localhost on one of the nodes. Maybe interesting to mention is the fact that all nodes in the cluster are configured as seeds and have all other nodes configured as seeds as well. I'm not sure if this is causing the problem and if it's even related. I'm using cassandra 0.6.2 and Hector 0.6.0-15 (latest github branch) What am I doing wrong here? Regards, Wouter
Re: hector or pelops
I've switched to Pelops recently, no problems with it for now, code become a little more compact.
cassandra_browser not in contrib
The python cassandra_browser is not in the contrib directory if I clone from git, but it is present if I checkout with svn. Is there typically a lag between svn trunk and git? Or is this intentional because the cassandra_browser is not going to be included going forward? Thanks Eben -- In science there are no 'depths'; there is surface everywhere. --Rudolph Carnap
Re: hector or pelops
As the developer of hector I can only speak in favor of my child of love and I haven't tried pelops so take the following with a grain of salt... Hector sees wide adoption and has been coined the de-facto java client. It's been in use in production critical systems since version 0.5.0 by a few companies. The development team is responsive and accepts patches from the community and is busy with new features and improvements all the time. There's a bug tracking system and all bugs are fixed very fast. There are two active mailing lists one for the developers and one for the users http://wiki.github.com/rantav/hector/mailing-lists (85 members) The project is maintained on github (http://github.com/rantav/hector) and the process in all is very transparent and open to the community. Code is well tested with an embedded version of cassandra which I contributed back to the main cassandra repository, it runs a mvn and an ant build and all release versions are available at http://github.com/rantav/hector/downloads including source code. We love contributions and want to make it as easy as possible to contribute back. I myself have made a few contributions to cassandra core so I'm well familiar with its internals, which doesn't hurt when you write a client... ...and finally the features (just the high level): - connection pooling - datacenter friendly - high level API - all public cassandra versions in the last 6 months - failover - simple LB - extensive JMX - well documented, many examples, wiki, mailing list, team of developers and contributors. ... and of course there's also thrift if you're into hacking on it... On Wed, Jun 23, 2010 at 5:38 PM, Serdar Irmak sir...@protel.com.tr wrote: Hi Which java client library do you reccommend, hector or pelops and why ? Best Regards, http://www.protel.com.tr/ -- *- Bu e-posta mesaji kisiye özel olup, gizli bilgiler iceriyor olabilir. Eger bu e-posta mesaji size yanlislikla ulasmissa, e-posta mesajini kullaniciya hemen geri gonderiniz ve mesaj kutunuzdan siliniz. **Bu e- posta mesaji, **hicbir sekilde, herhangi bir amac için dagitilamaz, yayinlanamaz ve para karsiligi satilamaz. Yollayici, bu e-posta mesajinin- **virus koruma sistemleri ile kontrol ediliyor olsa bile - **virus içermedigini garanti etmez ve meydana gelebilecek zararlardan dogacak hiçbir sorumlulugu kabul etmez. - The information contained in this message is confidential, intended solely for the use of the individual or entity to whom it is addressed and may be protected by professional secrecy. You should not copy, disclose or distribute this information for any purpose. If you are not the intended recipient of this message or you receive this mail in error, you should refrain from making any use of the contents and from opening any attachment. In that case, please notify the sender immediately and return the message to the sender, then, delete and destroy all copies. This e-mail message has been swept by anti-virus systems for the presence of computer viruses. In doing so, however, we cannot warrant that virus or other forms of data corruption may not be present and we do not take any responsibility in any occurrence.* --
Re: 10 minute cassandra pause
Are you seeing any sort of log messages from Cassandra at all? On Wed, Jun 23, 2010 at 2:26 PM, Sean Bridges sean.brid...@gmail.com wrote: We were running a load test against a single 0.6.2 cassandra node. 24 hours into the test, Cassandra appeared to be nearly frozen for 10 minutes. Our write rate went to almost 0, and we had a large number of write timeouts. We weren't swapping or gc'ing at the time. It looks like the problems were caused by our memtables flushing after 24 hours (we have MemtableFlushAfterMinutes=1440). Some of our column families are written to infrequently so that they don't hit the flush thresholds in MemtableOperationsInMillions and MemtableThroughputInMB. After 24 hours we had ~3000 commit log files. Is this flushing causing Cassandra to become unresponsive? I would have thought Cassandra could flush in the background without blocking new writes. Thanks, Sean
Call for input of cassandra, thrift , hector, pelops example / sample / test code snippets
Hi all, I have been researching the samples with some success but its taken a while. I am very keen on Cassandra and love the work thats been done, well done everyone involved. I would like to get as many of the samples I can get organized into something that makes it easier to kick of with for people taking the road I am on. If people on this list have code snippets, full example apps, test apps, API test functions etc I would like to hear about them please. My work is in Java so I really want to see those, the others are still of high interest as I will post them all out as I mention below. Ideally I would like to get a small test container set up to allow people to poke and prod API's and see what happens, but like most of us time is the challenge. If I do not get that far I would at least post the findings to page(s) that people can continue to add to, maybe if successful it could then be consumed back into the apachi wiki... If someone has already done this I would love to see the site. Let me know your thoughts, and better yet show me the code :-) Regards Gavan
Re: 10 minute cassandra pause
I see about 3000 lines of, INFO [COMMIT-LOG-WRITER] 2010-06-23 16:40:29,107 CommitLog.java (line 412) Discarding obsolete commit log:CommitLogSegment(/data1/cass/commitlog/CommitLog-1277302220723.log) Then, http://pastebin.com/YQA0mpRG It's around 16:50 that cassandra writes stop timing out. Some writes are getting through during this 10 minutes, but they shouldn't be enough to cause the index memtables to flush. Thanks, Sean On Wed, Jun 23, 2010 at 3:30 PM, Benjamin Black b...@b3k.us wrote: Are you seeing any sort of log messages from Cassandra at all? On Wed, Jun 23, 2010 at 2:26 PM, Sean Bridges sean.brid...@gmail.com wrote: We were running a load test against a single 0.6.2 cassandra node. 24 hours into the test, Cassandra appeared to be nearly frozen for 10 minutes. Our write rate went to almost 0, and we had a large number of write timeouts. We weren't swapping or gc'ing at the time. It looks like the problems were caused by our memtables flushing after 24 hours (we have MemtableFlushAfterMinutes=1440). Some of our column families are written to infrequently so that they don't hit the flush thresholds in MemtableOperationsInMillions and MemtableThroughputInMB. After 24 hours we had ~3000 commit log files. Is this flushing causing Cassandra to become unresponsive? I would have thought Cassandra could flush in the background without blocking new writes. Thanks, Sean
Re: forum application data model conversion
Any thoughts? On Tue, Jun 22, 2010 at 2:13 PM, S Ahmed sahmed1...@gmail.com wrote: Converting a Forum application to cassandra's data model. Tables: Posts [postID, threadID, userID, subject, body, created, lastmodified] So this table contains the actual question subject and body. When a user logs in, they want to see a list of their questions, and also order by the last-modified date (to see if people responed to their question). How would you do this best in Cassandra, seeing as the question/answer text is stored in another table. I know you could make a CF like: userID { postID1, postID2, ...} And somehow order by last-modified, but then on the actual web page you would have to first query for postID's owned by the user, and orderd by last-modified. THEN you would have to fetch the post data from the posts collection. Is this the only way? I mean other than repeating the post subject+body in the user-to-postID index CF.
RE: Hector vs cassandra-java-client
Agreed, but at what cost? It's my understanding that the big deterrent is the lack of 3rd party dependencies in maven public repos (e.g. Thrift itself). The option would be to publish a public maven repo containing all dependencies, which ends up being more responsibility then the client developers want to accept. Any volunteers? -Ken To: user@cassandra.apache.org From: bbo...@gmail.com Subject: Re: Hector vs cassandra-java-client Date: Tue, 22 Jun 2010 17:14:53 +0200 Dop Sun su...@dopsun.com writes: Updated. the first Cassandra client lib to make it into the Maven repositories will probably end up with a big audience. :-) -Bjørn _ Hotmail has tools for the New Busy. Search, chat and e-mail from your inbox. http://www.windowslive.com/campaign/thenewbusy?ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_1