Re: Working with legacy data via CQL
On 2014-11-11 19:40, Alex Popescu wrote: On Tuesday, November 11, 2014, Erik Forsberg forsb...@opera.com mailto:forsb...@opera.com wrote: You'll have better chances to get an answer about the Python driver on its own mailing list https://groups.google.com/a/lists.datastax.com/forum/#!forum/python-driver-user As I said, this also happens when using cqlsh: cqlsh:test SELECT column1,value from Users where key = a6b07340-047c-4d4c-9a02-1b59eabf611c and column1 = 'date_created'; column1 | value --+-- date_created | '\x00\x00\x00\x00Ta\xf3\xe0' (1 rows) Failed to decode value '\x00\x00\x00\x00Ta\xf3\xe0' (for column 'value') as text: 'utf8' codec can't decode byte 0xf3 in position 6: unexpected end of data So let me rephrase: How do I work with data where the table has metadata that makes some columns differ from the main validation class? From cqlsh, or the python driver, or any driver? Thanks, \EF
Re: Better option to load data to cassandra
On our project we wrote ourself a custom batch to load the data to cassandra the way we wanted. -- Brice On Tue, Nov 11, 2014 at 2:33 PM, srinivas rao pinnakasrin...@gmail.com wrote: hi Alexey, i tried with sqoop, and data stax copy command. any other options we can use. i have one more question that, have a table with compisite key as row key, so how can i do using sqoop or copy command while exporing. Thanks Srinivas On Tue, Nov 11, 2014 at 6:28 PM, Plotnik, Alexey aplot...@rhonda.ru wrote: What have you tried? -- Original Message -- From: srinivas rao pinnakasrin...@gmail.com To: Cassandra Users user@cassandra.apache.org Sent: 11.11.2014 22:51:54 Subject: Better option to load data to cassandra Hi Team, Please suggest me the better options to load data from NoSql to Cassndra. Thanks Srini
Re: nodetool repair stalled
Wouldn't it be a better idea to issue removenode on the crashed node, wipe the whole data directory (including system) and let it bootstrap cleanly so that it's not part of the cluster while it gets back up to speed? On Tue, Nov 11, 2014, 12:32 PM Robert Coli rc...@eventbrite.com wrote: On Tue, Nov 11, 2014 at 10:48 AM, venkat sam samvenkat...@outlook.com wrote: I have a 5 node cluster. In one node one of the data directory partition got crashed. After disk replacement I restarted the Cassandra daemon and gave nodetool repair to restore the missing replica’s. But nodetool repair is getting stuck after syncing one of the columnfamily Yes, nodetool repair often hangs. Search through the archives, but the summary is. 1) try to repair CFs one at a time 2) it's worse with vnodes 3) try tuning the phi detector or network stream timeouts =Rob
Cassandra sort using updatable query
Hello all, I have a data set with attributes content and year. I want to put them in to CF 'words' with attributes ('content','year','frequency'). The CF should support following operations. - Frequency attribute of a column can be updated (i.e. - : can run query like UPDATE words SET frequency = 2 WHERE content='abc' AND year=1990;), where clause should contain content and year - Should support select query like Select content from words where year = 2010 ORDER BY frequency DESC LIMIT 10; (where clause only has year) where results can be ordered using frequency Is this kind of requirement can be fulfilled using Cassandra? What is the CF structure and indexing I need to use here? What queries should I use to create CF and in indexing? Thank You! -- *Chamila Dilshan Wijayarathna,* SMIEEE, SMIESL, Undergraduate, Department of Computer Science and Engineering, University of Moratuwa.
Re: Cassandra sort using updatable query
With Cassandra you're going to want to model tables to meet the requirements of your queries instead of like a relational database where you build tables in 3NF then optimize after. For your optimized select query, your table (with caveat, see below) could start out as: create table words ( year int, frequency int, content text, primary key (year, frequency, content) ); You may want to maintain other tables as well for different types of select statements. Your UPDATE statement above won't work, you'll have to DELETE and INSERT, since you can't change the value of a clustering column. If you don't know what your old frequency is ahead of time (to do the delete), you'll need to keep another table mapping content,year - frequency. Now, the tricky part here is that the above model will limit the total number of partitions you've got to the number of years you're working with, and will not scale as the cluster increases in size. Ideally you could bucket frequencies. If that feels like too much work (it's starting to for me), this may be better suited to something like solr, elastic search, or DSE (cassandra + solr). Does that help? Jon On Wed Nov 12 2014 at 9:01:44 AM Chamila Wijayarathna cdwijayarat...@gmail.com wrote: Hello all, I have a data set with attributes content and year. I want to put them in to CF 'words' with attributes ('content','year','frequency'). The CF should support following operations. - Frequency attribute of a column can be updated (i.e. - : can run query like UPDATE words SET frequency = 2 WHERE content='abc' AND year=1990;), where clause should contain content and year - Should support select query like Select content from words where year = 2010 ORDER BY frequency DESC LIMIT 10; (where clause only has year) where results can be ordered using frequency Is this kind of requirement can be fulfilled using Cassandra? What is the CF structure and indexing I need to use here? What queries should I use to create CF and in indexing? Thank You! -- *Chamila Dilshan Wijayarathna,* SMIEEE, SMIESL, Undergraduate, Department of Computer Science and Engineering, University of Moratuwa.
Re: nodetool repair stalled
On Wed, Nov 12, 2014 at 6:50 AM, Eric Stevens migh...@gmail.com wrote: Wouldn't it be a better idea to issue removenode on the crashed node, wipe the whole data directory (including system) and let it bootstrap cleanly so that it's not part of the cluster while it gets back up to Yes, with replace_node. I missed that the entire data dir was lost in my first response. =Rob
Cassandra patterns/design for setting up a history/version/change log table?
Hi Guys, Assuming you have, for example, an “account” table, and an “account_history” table which simply tracks older versions of what a persons account looks like when an administrator edits a customer account. Given that we don’t have the luxury of a safe transaction to update the account record, i.e. to do: - select account details - compare old account details with new account details - if there are changes to the account - copy old account details to account_history table - update account How do people deal with this in a multi data centre environment? The closest thing I can think of is something like this on “save: - insert new record into account_history table - update record into account table - every hour look for duplicate rows in account_history table and duplicate where someone did a save that did not change any fields on the account table. My biggest problem with the above, is, what happens if you want to bulk load a data file into your account table, and it — for example — contains 1 million records, and only actually changes 100 account entries. For bulk loading you could probably resort to doing a select before update” just to prevent 1 million pointless updates into the account_history table, but that feels a bit yucky. Some sort of java stored procedure might help here, but surely this is a common enough use case that we shouldn’t have to write custom java code for the Cassandra right? Thanks! Jacob
Programmatic Cassandra version detection/extraction
Hi, Is there a way to detect which version of Cassandra one is running? Is there an API for that, or a constant with this value, or maybe an MBean or some other way to get to this info? Here's the use case: SPM monitors Cassandra http://sematext.com/spm/, but Cassandra MBeans and metrics have or may change over time. How will SPM agent know which MBeans to look for, which metrics to extract, and how to interpret values it extracts without knowing which version of Cassandra it's monitoring? It could try probing for some known MBeans and deduce Cassandra version from that, but that feels a little sloppy. Ideally, we'd be able to grab the version from some MBean and based on that extract metrics we know are exposed in that version of Cassandra. Thanks, Otis -- Monitoring * Alerting * Anomaly Detection * Centralized Log Management Solr Elasticsearch Support * http://sematext.com/
Re: Programmatic Cassandra version detection/extraction
On 11/12/2014 04:44 PM, Otis Gospodnetic wrote: Is there a way to detect which version of Cassandra one is running? Is there an API for that, or a constant with this value, or maybe an MBean or some other way to get to this info? I'm not sure if there are other methods, but this should always work: SELECT release_version from system.local; -- Michael
Re: Programmatic Cassandra version detection/extraction
On 11/12/2014 04:58 PM, Michael Shuler wrote: On 11/12/2014 04:44 PM, Otis Gospodnetic wrote: Is there a way to detect which version of Cassandra one is running? Is there an API for that, or a constant with this value, or maybe an MBean or some other way to get to this info? I'm not sure if there are other methods, but this should always work: SELECT release_version from system.local; I asked the devs about where I might find the version in jmx and got the hint that I could cheat and look at `nodetool gossipinfo`. It looks like RELEASE_VERSION is reported as a field in org.apache.cassandra.net FailureDetector AllEndpointStates. -- Michael
Re: Better option to load data to cassandra
Sstableloader works well for large tables if you want to move data from Cassandra to Cassandra. This works if both C* are on the same version. Sstable2json and json2sstable is another alternative. On Nov 11, 2014 4:53 AM, srinivas rao pinnakasrin...@gmail.com wrote: Hi Team, Please suggest me the better options to load data from NoSql to Cassndra. Thanks Srini
Re: nodetool repair stalled
Hi Eric, The data are stored in JBOD. Only one of the disk got crashed other 3 disk still holds the old data . That's why I didn't clean the whole node and issue a fresh restart Thanks Rob. Will do try that way. From: Eric Stevens Sent: Wednesday, November 12, 2014 8:21 PM To: user@cassandra.apache.org Wouldn't it be a better idea to issue removenode on the crashed node, wipe the whole data directory (including system) and let it bootstrap cleanly so that it's not part of the cluster while it gets back up to speed? On Tue, Nov 11, 2014, 12:32 PM Robert Coli rc...@eventbrite.com wrote: On Tue, Nov 11, 2014 at 10:48 AM, venkat sam samvenkat...@outlook.com wrote: I have a 5 node cluster. In one node one of the data directory partition got crashed. After disk replacement I restarted the Cassandra daemon and gave nodetool repair to restore the missing replica’s. But nodetool repair is getting stuck after syncing one of the columnfamily Yes, nodetool repair often hangs. Search through the archives, but the summary is. 1) try to repair CFs one at a time 2) it's worse with vnodes 3) try tuning the phi detector or network stream timeouts =Rob