Re: Working with legacy data via CQL
On 2014-11-15 01:24, Tyler Hobbs wrote: What version of cassandra did you originally create the column family in? Have you made any schema changes to it through cql or cassandra-cli, or has it always been exactly the same? Oh that's a tough question given that the cluster has been around since 2011. So CF was probably created in Cassandra 0.7 or 0.8 via thrift calls from pycassa, and I don't think there has been any schema changes to it since. Thanks, \EF On Wed, Nov 12, 2014 at 2:06 AM, Erik Forsberg forsb...@opera.com mailto:forsb...@opera.com wrote: On 2014-11-11 19:40, Alex Popescu wrote: On Tuesday, November 11, 2014, Erik Forsberg forsb...@opera.com mailto:forsb...@opera.com mailto:forsb...@opera.com mailto:forsb...@opera.com wrote: You'll have better chances to get an answer about the Python driver on its own mailing list https://groups.google.com/a/lists.datastax.com/forum/#!forum/python-driver-user As I said, this also happens when using cqlsh: cqlsh:test SELECT column1,value from Users where key = a6b07340-047c-4d4c-9a02-1b59eabf611c and column1 = 'date_created'; column1 | value --+-- date_created | '\x00\x00\x00\x00Ta\xf3\xe0' (1 rows) Failed to decode value '\x00\x00\x00\x00Ta\xf3\xe0' (for column 'value') as text: 'utf8' codec can't decode byte 0xf3 in position 6: unexpected end of data So let me rephrase: How do I work with data where the table has metadata that makes some columns differ from the main validation class? From cqlsh, or the python driver, or any driver? Thanks, \EF -- Tyler Hobbs DataStax http://datastax.com/
Re: Repair completes successfully but data is still inconsistent
On 14 Nov 2014, at 18:44, André Cruz andre.c...@co.sapo.pt wrote: On 14 Nov 2014, at 18:29, Michael Shuler mich...@pbandjelly.org wrote: On 11/14/2014 12:12 PM, André Cruz wrote: Some extra info. I checked the backups and on the 8th of November, all 3 replicas had the tombstone of the deleted column. So: 1 November - column is deleted - gc_grace_period is 10 days 8 November - all 3 replicas have tombstone 13/14 November - column/tombstone is gone on 2 replicas, 3rd replica has the original value (!), with the original timestamp… After seeing your first post, this is helpful info. I'm curious what the logs show between the 8th-13th, particularly around the 10th-11th :) Which logs in particular, just the ones from the 3rd machine which has the zombie column? What should I be looking for? :) I have checked the logs of the 3 replicas for that period and nothing really jumps out. Still, repairs have been running daily, the log reports that the CF is synced, and as of this moment one of the replicas still returns the zombie column so they don’t agree on the data. André
Re: Working with legacy data via CQL
On 2014-11-17 09:56, Erik Forsberg wrote: On 2014-11-15 01:24, Tyler Hobbs wrote: What version of cassandra did you originally create the column family in? Have you made any schema changes to it through cql or cassandra-cli, or has it always been exactly the same? Oh that's a tough question given that the cluster has been around since 2011. So CF was probably created in Cassandra 0.7 or 0.8 via thrift calls from pycassa, and I don't think there has been any schema changes to it since. Actually, I don't think it matters. I created a minimal repeatable set of python code (see below). Running that against a 2.0.11 server, creating fresh keyspace and CF, then insert some data with thrift/pycassa, then trying to extract the data that has a different validation class, the python-driver and cqlsh bails out. cqlsh example after running the below script: cqlsh:badcql select * from Users where column1 = 'default_account_id' ALLOW FILTERING; value \xf9\x8bu}!\xe9C\xbb\xa7=\xd0\x8a\xff';\xe5 (in col 'value') can't be deserialized as text: 'utf8' codec can't decode byte 0xf9 in position 0: invalid start byte cqlsh:badcql select * from Users where column1 = 'date_created' ALLOW FILTERING; value '\x00\x00\x00\x00Ti\xe0\xbe' (in col 'value') can't be deserialized as text: 'utf8' codec can't decode bytes in position 6-7: unexpected end of data So the question remains - how do I work with this data from cqlsh and / or the python driver? Thanks, \EF --repeatable example-- #!/usr/bin/env python # Run this in virtualenv with pycassa and cassandra-driver installed via pip import pycassa import cassandra import calendar import traceback import time from uuid import uuid4 keyspace = badcql sysmanager = pycassa.system_manager.SystemManager(localhost) sysmanager.create_keyspace(keyspace, strategy_options={'replication_factor':'1'}) sysmanager.create_column_family(keyspace, Users, key_validation_class=pycassa.system_manager.LEXICAL_UUID_TYPE, comparator_type=pycassa.system_manager.ASCII_TYPE, default_validation_class=pycassa.system_manager.UTF8_TYPE) sysmanager.create_index(keyspace, Users, username, pycassa.system_manager.UTF8_TYPE) sysmanager.create_index(keyspace, Users, email, pycassa.system_manager.UTF8_TYPE) sysmanager.alter_column(keyspace, Users, default_account_id, pycassa.system_manager.LEXICAL_UUID_TYPE) sysmanager.create_index(keyspace, Users, active, pycassa.system_manager.INT_TYPE) sysmanager.alter_column(keyspace, Users, date_created, pycassa.system_manager.LONG_TYPE) pool = pycassa.pool.ConnectionPool(keyspace, ['localhost:9160']) cf = pycassa.ColumnFamily(pool, Users) user_uuid = uuid4() cf.insert(user_uuid, {'username':'test_username', 'auth_method':'ldap', 'email':'t...@example.com', 'active':1, 'date_created':long(calendar.timegm(time.gmtime())), 'default_account_id':uuid4()}) from cassandra.cluster import Cluster cassandra_cluster = Cluster([localhost]) cassandra_session = cassandra_cluster.connect(keyspace) print username, cassandra_session.execute('SELECT value from Users where key = %s and column1 = %s', (user_uuid, 'username',)) print email, cassandra_session.execute('SELECT value from Users where key = %s and column1 = %s', (user_uuid, 'email',)) try: print default_account_id, cassandra_session.execute('SELECT value from Users where key = %s and column1 = %s', (user_uuid, 'default_account_id',)) except Exception as e: print Exception trying to get default_account_id, traceback.format_exc() cassandra_session = cassandra_cluster.connect(keyspace) try: print active, cassandra_session.execute('SELECT value from Users where key = %s and column1 = %s', (user_uuid, 'active',)) except Exception as e: print Exception trying to get active, traceback.format_exc() cassandra_session = cassandra_cluster.connect(keyspace) try: print date_created, cassandra_session.execute('SELECT value from Users where key = %s and column1 = %s', (user_uuid, 'date_created',)) except Exception as e: print Exception trying to get date_created, traceback.format_exc() -- end of example --
Deduplicating data on a node (RF=1)
Hey all, For legacy reasons we're living with Cassandra 2.0.10 in an RF=1 setup. This is being moved away from ASAP. In the meantime, adding a node recently encountered a Stream Failed error (http://pastie.org/9725846). Cassandra restarted and it seemingly restarted streaming from zero, without having removed the failed stream's data. With bootstrapping and initial compactions finished that node now has what seems to be duplicate data, with almost exactly 2x the expected disk usage. CQL returns correct results but we depend on the ability to directly read the SSTable files (hence also RF=1.) Would anyone have suggestions on a good way to resolve this? Thanks, Alain
IF NOT EXISTS on UPDATE statements?
There’s still a lot of weirdness in CQL. For example, you can do an INSERT with an UPDATE .. .which I’m generally fine with. Kind of make sense. However, with INSERT you can do IF NOT EXISTS. … but you can’t do the same thing on UPDATE. So I foolishly wrote all my code assuming that INSERT/UPDATE were orthogonal, but now they’re not. you can still do IF on UPDATE though… but it’s not possible to do IF mycolumn IS NULL .. so is there a way to mimic IF NOT EXISTS on UPDATE or is this just a bug? -- Founder/CEO Spinn3r.com Location: *San Francisco, CA* blog: http://burtonator.wordpress.com … or check out my Google+ profile https://plus.google.com/102718274791889610666/posts http://spinn3r.com
Re: IF NOT EXISTS on UPDATE statements?
So I foolishly wrote all my code assuming that INSERT/UPDATE were orthogonal, but now they’re not There are some subtle differences. INSERT will create marker columns, UPDATE won't touch/modify them. What are marker columns ? Some insights here: http://www.slideshare.net/doanduyhai/cassandra-introduction-40711134/87 you can still do IF on UPDATE though… but it’s not possible to do IF mycolumn IS NULL -- If mycolumn = null should work On Mon, Nov 17, 2014 at 10:52 PM, Kevin Burton bur...@spinn3r.com wrote: There’s still a lot of weirdness in CQL. For example, you can do an INSERT with an UPDATE .. .which I’m generally fine with. Kind of make sense. However, with INSERT you can do IF NOT EXISTS. … but you can’t do the same thing on UPDATE. So I foolishly wrote all my code assuming that INSERT/UPDATE were orthogonal, but now they’re not. you can still do IF on UPDATE though… but it’s not possible to do IF mycolumn IS NULL .. so is there a way to mimic IF NOT EXISTS on UPDATE or is this just a bug? -- Founder/CEO Spinn3r.com Location: *San Francisco, CA* blog: http://burtonator.wordpress.com … or check out my Google+ profile https://plus.google.com/102718274791889610666/posts http://spinn3r.com
Re: Force purging of tombstones
On Mon, Nov 17, 2014 at 1:40 PM, Ken Hancock ken.hanc...@schange.com wrote: You can use the JMX forceUserDefinedCompaction operation to compact each SSTable individually. https://github.com/hancockks/cassandra-compact-cf I don't recall why I think this, but I think cleanup now also discards expired tombstones, and is easier to use from nodetool than UserDefinedCompaction. =Rob
Re: Force purging of tombstones
Doesn't repair also get rid of tombstones? Rahul Neelakantan On Nov 17, 2014, at 5:53 PM, Robert Coli rc...@eventbrite.com wrote: On Mon, Nov 17, 2014 at 1:40 PM, Ken Hancock ken.hanc...@schange.com wrote: You can use the JMX forceUserDefinedCompaction operation to compact each SSTable individually. https://github.com/hancockks/cassandra-compact-cf I don't recall why I think this, but I think cleanup now also discards expired tombstones, and is easier to use from nodetool than UserDefinedCompaction. =Rob
Re: IF NOT EXISTS on UPDATE statements?
you can still do IF on UPDATE though… but it’s not possible to do IF mycolumn IS NULL -- If mycolumn = null should work Alas.. it doesn’t :-/ -- Founder/CEO Spinn3r.com Location: *San Francisco, CA* blog: http://burtonator.wordpress.com … or check out my Google+ profile https://plus.google.com/102718274791889610666/posts http://spinn3r.com
Re: IF NOT EXISTS on UPDATE statements?
Just tested with C* 2.1.1 cqlsh:test CREATE TABLE simple(id int PRIMARY KEY, val text); cqlsh:test INSERT INTO simple (id) VALUES (1); cqlsh:test SELECT * FROM simple ; id | val +-- 1 | null (1 rows) cqlsh:test UPDATE simple SET val = 'new val' WHERE id=1 *IF val = null*; [applied] --- True cqlsh:test SELECT * FROM simple ; id | val +- 1 | new val (1 rows) On Tue, Nov 18, 2014 at 12:12 AM, Kevin Burton bur...@spinn3r.com wrote: you can still do IF on UPDATE though… but it’s not possible to do IF mycolumn IS NULL -- If mycolumn = null should work Alas.. it doesn’t :-/ -- Founder/CEO Spinn3r.com Location: *San Francisco, CA* blog: http://burtonator.wordpress.com … or check out my Google+ profile https://plus.google.com/102718274791889610666/posts http://spinn3r.com
Re: Force purging of tombstones
On Mon, Nov 17, 2014 at 3:10 PM, Rahul Neelakantan ra...@rahul.be wrote: Doesn't repair also get rid of tombstones? Repair is a non-destructive activity, and therefore cannot purge tombstones. =Rob
Re: IF NOT EXISTS on UPDATE statements?
Oh yes. That will work because a value is already there. I’m talking if the value does not exist. Otherwise I’d have to insert a null first. On Mon, Nov 17, 2014 at 3:30 PM, DuyHai Doan doanduy...@gmail.com wrote: Just tested with C* 2.1.1 cqlsh:test CREATE TABLE simple(id int PRIMARY KEY, val text); cqlsh:test INSERT INTO simple (id) VALUES (1); cqlsh:test SELECT * FROM simple ; id | val +-- 1 | null (1 rows) cqlsh:test UPDATE simple SET val = 'new val' WHERE id=1 *IF val = null*; [applied] --- True cqlsh:test SELECT * FROM simple ; id | val +- 1 | new val (1 rows) On Tue, Nov 18, 2014 at 12:12 AM, Kevin Burton bur...@spinn3r.com wrote: you can still do IF on UPDATE though… but it’s not possible to do IF mycolumn IS NULL -- If mycolumn = null should work Alas.. it doesn’t :-/ -- Founder/CEO Spinn3r.com Location: *San Francisco, CA* blog: http://burtonator.wordpress.com … or check out my Google+ profile https://plus.google.com/102718274791889610666/posts http://spinn3r.com -- Founder/CEO Spinn3r.com Location: *San Francisco, CA* blog: http://burtonator.wordpress.com … or check out my Google+ profile https://plus.google.com/102718274791889610666/posts http://spinn3r.com
Re: Repair completes successfully but data is still inconsistent
On 11/17/2014 05:22 AM, André Cruz wrote: I have checked the logs of the 3 replicas for that period and nothing really jumps out. Still, repairs have been running daily, the log reports that the CF is synced, and as of this moment one of the replicas still returns the zombie column so they don’t agree on the data. I'm at a bit of a loss. Readers, is `nodetool scrub` going to be helpful here, or any other suggestions of things to look for? André, does `nodetool gossipinfo` show all the nodes in schema agreement? -- Michael
Re: Deduplicating data on a node (RF=1)
On 11/17/2014 02:04 PM, Alain Vandendorpe wrote: Hey all, For legacy reasons we're living with Cassandra 2.0.10 in an RF=1 setup. This is being moved away from ASAP. In the meantime, adding a node recently encountered a Stream Failed error (http://pastie.org/9725846). Cassandra restarted and it seemingly restarted streaming from zero, without having removed the failed stream's data. With bootstrapping and initial compactions finished that node now has what seems to be duplicate data, with almost exactly 2x the expected disk usage. CQL returns correct results but we depend on the ability to directly read the SSTable files (hence also RF=1.) Would anyone have suggestions on a good way to resolve this? Start over fresh, deleting *all* the data, and bootstrap the node again? -- Michael
Trying to build Cassandra for FreeBSD 10.1
I've successfully built 2.1.2 for FreeBSD, but the JVM crashes upon start-up. Here's the snippet from the top of the log file (attached) # # A fatal error has been detected by the Java Runtime Environment: # # SIGSEGV (0xb) at pc=0x000802422655, pid=76732, tid=34384931840 # Any hints on how to get 2.1.2 working on FreeBSD is appreciated. hs_err_pid76732.log Description: Binary data
Re: Deduplicating data on a node (RF=1)
If he deletes all the data with RF=1, won't he have data loss? On Mon Nov 17 2014 at 5:14:23 PM Michael Shuler mich...@pbandjelly.org wrote: On 11/17/2014 02:04 PM, Alain Vandendorpe wrote: Hey all, For legacy reasons we're living with Cassandra 2.0.10 in an RF=1 setup. This is being moved away from ASAP. In the meantime, adding a node recently encountered a Stream Failed error (http://pastie.org/9725846). Cassandra restarted and it seemingly restarted streaming from zero, without having removed the failed stream's data. With bootstrapping and initial compactions finished that node now has what seems to be duplicate data, with almost exactly 2x the expected disk usage. CQL returns correct results but we depend on the ability to directly read the SSTable files (hence also RF=1.) Would anyone have suggestions on a good way to resolve this? Start over fresh, deleting *all* the data, and bootstrap the node again? -- Michael
Re: Trying to build Cassandra for FreeBSD 10.1
On 11/17/2014 07:19 PM, William Arbaugh wrote: I've successfully built 2.1.2 for FreeBSD, but the JVM crashes upon start-up. Here's the snippet from the top of the log file (attached) # # A fatal error has been detected by the Java Runtime Environment: # # SIGSEGV (0xb) at pc=0x000802422655, pid=76732, tid=34384931840 # Any hints on how to get 2.1.2 working on FreeBSD is appreciated. Not a very helpful hint, other than the fact you are not alone :) https://issues.apache.org/jira/browse/CASSANDRA-8325 -- Michael
Re: Deduplicating data on a node (RF=1)
On 11/17/2014 07:20 PM, Jonathan Haddad wrote: If he deletes all the data with RF=1, won't he have data loss? Of course, ignore my quick answer, Alain. -- Michael
Re: Trying to build Cassandra for FreeBSD 10.1
Only thing I can see from looking at the exception, is that it looks like - I didn’t disassemble the code from hex - that the “peer” value in the RefCountedMemory object is probably 0 Given that Unsafe.allocateMemory should not return 0 even on allocation failure (which should throw OOM) - though you should add a log statement to the Memory class to check that - I’d suggest logging to see if anyone is calling SSTableReader.releaseSummary, which could set the peer to 0 On Nov 17, 2014, at 7:30 PM, Michael Shuler mich...@pbandjelly.org wrote: On 11/17/2014 07:19 PM, William Arbaugh wrote: I've successfully built 2.1.2 for FreeBSD, but the JVM crashes upon start-up. Here's the snippet from the top of the log file (attached) # # A fatal error has been detected by the Java Runtime Environment: # # SIGSEGV (0xb) at pc=0x000802422655, pid=76732, tid=34384931840 # Any hints on how to get 2.1.2 working on FreeBSD is appreciated. Not a very helpful hint, other than the fact you are not alone :) https://issues.apache.org/jira/browse/CASSANDRA-8325 -- Michael smime.p7s Description: S/MIME cryptographic signature
Re: Deduplicating data on a node (RF=1)
If the new node never formally joined the cluster (streaming never completed, it never entered UN state), shouldn't that node be safe to scrub and start over again? It shouldn't be taking primary writes while it's bootstrapping, should it? On Mon Nov 17 2014 at 6:34:04 PM Michael Shuler mich...@pbandjelly.org wrote: On 11/17/2014 07:20 PM, Jonathan Haddad wrote: If he deletes all the data with RF=1, won't he have data loss? Of course, ignore my quick answer, Alain. -- Michael