Re: Working with legacy data via CQL

2014-11-17 Thread Erik Forsberg
On 2014-11-15 01:24, Tyler Hobbs wrote:
 What version of cassandra did you originally create the column family
 in?  Have you made any schema changes to it through cql or
 cassandra-cli, or has it always been exactly the same?

Oh that's a tough question given that the cluster has been around since
2011. So CF was probably created in Cassandra 0.7 or 0.8 via thrift
calls from pycassa, and I don't think there has been any schema changes
to it since.

Thanks,
\EF

 
 On Wed, Nov 12, 2014 at 2:06 AM, Erik Forsberg forsb...@opera.com
 mailto:forsb...@opera.com wrote:
 
 On 2014-11-11 19:40, Alex Popescu wrote:
  On Tuesday, November 11, 2014, Erik Forsberg forsb...@opera.com 
 mailto:forsb...@opera.com
  mailto:forsb...@opera.com mailto:forsb...@opera.com wrote:
 
 
  You'll have better chances to get an answer about the Python driver on
  its own mailing
  list  
 https://groups.google.com/a/lists.datastax.com/forum/#!forum/python-driver-user
 
 As I said, this also happens when using cqlsh:
 
 cqlsh:test SELECT column1,value from Users where key =
 a6b07340-047c-4d4c-9a02-1b59eabf611c and column1 = 'date_created';
 
  column1  | value
 --+--
  date_created | '\x00\x00\x00\x00Ta\xf3\xe0'
 
 (1 rows)
 
 Failed to decode value '\x00\x00\x00\x00Ta\xf3\xe0' (for column 'value')
 as text: 'utf8' codec can't decode byte 0xf3 in position 6: unexpected
 end of data
 
 So let me rephrase: How do I work with data where the table has metadata
 that makes some columns differ from the main validation class? From
 cqlsh, or the python driver, or any driver?
 
 Thanks,
 \EF
 
 
 
 
 -- 
 Tyler Hobbs
 DataStax http://datastax.com/



Re: Repair completes successfully but data is still inconsistent

2014-11-17 Thread André Cruz
On 14 Nov 2014, at 18:44, André Cruz andre.c...@co.sapo.pt wrote:
 
 On 14 Nov 2014, at 18:29, Michael Shuler mich...@pbandjelly.org wrote:
 
 On 11/14/2014 12:12 PM, André Cruz wrote:
 Some extra info. I checked the backups and on the 8th of November, all 3 
 replicas had the tombstone of the deleted column. So:
 
 1 November - column is deleted - gc_grace_period is 10 days
 8 November - all 3 replicas have tombstone
 13/14 November - column/tombstone is gone on 2 replicas, 3rd replica has 
 the original value (!), with the original timestamp…
 
 After seeing your first post, this is helpful info. I'm curious what the 
 logs show between the 8th-13th, particularly around the 10th-11th :)
 
 Which logs in particular, just the ones from the 3rd machine which has the 
 zombie column? What should I be looking for? :)

I have checked the logs of the 3 replicas for that period and nothing really 
jumps out. Still, repairs have been running daily, the log reports that the CF 
is synced, and as of this moment one of the replicas still returns the zombie 
column so they don’t agree on the data.

André

Re: Working with legacy data via CQL

2014-11-17 Thread Erik Forsberg
On 2014-11-17 09:56, Erik Forsberg wrote:
 On 2014-11-15 01:24, Tyler Hobbs wrote:
 What version of cassandra did you originally create the column family
 in?  Have you made any schema changes to it through cql or
 cassandra-cli, or has it always been exactly the same?
 
 Oh that's a tough question given that the cluster has been around since
 2011. So CF was probably created in Cassandra 0.7 or 0.8 via thrift
 calls from pycassa, and I don't think there has been any schema changes
 to it since.

Actually, I don't think it matters. I created a minimal repeatable set
of python code (see below). Running that against a 2.0.11 server,
creating fresh keyspace and CF, then insert some data with
thrift/pycassa, then trying to extract the data that has a different
validation class, the python-driver and cqlsh bails out.

cqlsh example after running the below script:

cqlsh:badcql select * from Users where column1 = 'default_account_id'
ALLOW FILTERING;

value \xf9\x8bu}!\xe9C\xbb\xa7=\xd0\x8a\xff';\xe5 (in col 'value')
can't be deserialized as text: 'utf8' codec can't decode byte 0xf9 in
position 0: invalid start byte

cqlsh:badcql select * from Users where column1 = 'date_created' ALLOW
FILTERING;

value '\x00\x00\x00\x00Ti\xe0\xbe' (in col 'value') can't be
deserialized as text: 'utf8' codec can't decode bytes in position 6-7:
unexpected end of data


So the question remains - how do I work with this data from cqlsh and /
or the python driver?

Thanks,
\EF

--repeatable example--
#!/usr/bin/env python

# Run this in virtualenv with pycassa and cassandra-driver installed via pip
import pycassa
import cassandra
import calendar
import traceback
import time
from uuid import uuid4

keyspace = badcql

sysmanager = pycassa.system_manager.SystemManager(localhost)
sysmanager.create_keyspace(keyspace,
strategy_options={'replication_factor':'1'})
sysmanager.create_column_family(keyspace, Users,
key_validation_class=pycassa.system_manager.LEXICAL_UUID_TYPE,

comparator_type=pycassa.system_manager.ASCII_TYPE,

default_validation_class=pycassa.system_manager.UTF8_TYPE)
sysmanager.create_index(keyspace, Users, username,
pycassa.system_manager.UTF8_TYPE)
sysmanager.create_index(keyspace, Users, email,
pycassa.system_manager.UTF8_TYPE)
sysmanager.alter_column(keyspace, Users, default_account_id,
pycassa.system_manager.LEXICAL_UUID_TYPE)
sysmanager.create_index(keyspace, Users, active,
pycassa.system_manager.INT_TYPE)
sysmanager.alter_column(keyspace, Users, date_created,
pycassa.system_manager.LONG_TYPE)

pool = pycassa.pool.ConnectionPool(keyspace, ['localhost:9160'])
cf = pycassa.ColumnFamily(pool, Users)

user_uuid = uuid4()

cf.insert(user_uuid, {'username':'test_username', 'auth_method':'ldap',
'email':'t...@example.com', 'active':1,

'date_created':long(calendar.timegm(time.gmtime())),
'default_account_id':uuid4()})

from cassandra.cluster import Cluster
cassandra_cluster = Cluster([localhost])
cassandra_session = cassandra_cluster.connect(keyspace)
print username, cassandra_session.execute('SELECT value from Users
where key = %s and column1 = %s', (user_uuid, 'username',))
print email, cassandra_session.execute('SELECT value from Users
where key = %s and column1 = %s', (user_uuid, 'email',))
try:
print default_account_id, cassandra_session.execute('SELECT value
from Users where key = %s and column1 = %s', (user_uuid,
'default_account_id',))
except Exception as e:
print Exception trying to get default_account_id,
traceback.format_exc()
cassandra_session = cassandra_cluster.connect(keyspace)

try:
print active, cassandra_session.execute('SELECT value from Users
where key = %s and column1 = %s', (user_uuid, 'active',))
except Exception as e:
print Exception trying to get active, traceback.format_exc()
cassandra_session = cassandra_cluster.connect(keyspace)

try:
print date_created, cassandra_session.execute('SELECT value from
Users where key = %s and column1 = %s', (user_uuid, 'date_created',))
except Exception as e:
print Exception trying to get date_created, traceback.format_exc()
-- end of example --


Deduplicating data on a node (RF=1)

2014-11-17 Thread Alain Vandendorpe
Hey all,

For legacy reasons we're living with Cassandra 2.0.10 in an RF=1 setup.
This is being moved away from ASAP. In the meantime, adding a node recently
encountered a Stream Failed error (http://pastie.org/9725846). Cassandra
restarted and it seemingly restarted streaming from zero, without having
removed the failed stream's data.

With bootstrapping and initial compactions finished that node now has what
seems to be duplicate data, with almost exactly 2x the expected disk usage.
CQL returns correct results but we depend on the ability to directly read
the SSTable files (hence also RF=1.)

Would anyone have suggestions on a good way to resolve this?

Thanks,
Alain


IF NOT EXISTS on UPDATE statements?

2014-11-17 Thread Kevin Burton
There’s still a lot of weirdness in CQL.

For example, you can do an INSERT with an UPDATE .. .which I’m generally
fine with.  Kind of make sense.

However, with INSERT you can do IF NOT EXISTS.

… but you can’t do the same thing on UPDATE.

So I foolishly wrote all my code assuming that INSERT/UPDATE were
orthogonal, but now they’re not.

you can still do IF on UPDATE though… but it’s not possible to do IF
mycolumn IS NULL

.. so is there a way to mimic IF NOT EXISTS on UPDATE or is this just a bug?

-- 

Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
blog: http://burtonator.wordpress.com
… or check out my Google+ profile
https://plus.google.com/102718274791889610666/posts
http://spinn3r.com


Re: IF NOT EXISTS on UPDATE statements?

2014-11-17 Thread DuyHai Doan
So I foolishly wrote all my code assuming that INSERT/UPDATE were
orthogonal, but now they’re not

 There are some subtle differences.

 INSERT will create marker columns, UPDATE won't touch/modify them. What
are marker columns ? Some insights here:
http://www.slideshare.net/doanduyhai/cassandra-introduction-40711134/87

you can still do IF on UPDATE though… but it’s not possible to do IF
mycolumn IS NULL -- If mycolumn = null should work



On Mon, Nov 17, 2014 at 10:52 PM, Kevin Burton bur...@spinn3r.com wrote:

 There’s still a lot of weirdness in CQL.

 For example, you can do an INSERT with an UPDATE .. .which I’m generally
 fine with.  Kind of make sense.

 However, with INSERT you can do IF NOT EXISTS.

 … but you can’t do the same thing on UPDATE.

 So I foolishly wrote all my code assuming that INSERT/UPDATE were
 orthogonal, but now they’re not.

 you can still do IF on UPDATE though… but it’s not possible to do IF
 mycolumn IS NULL

 .. so is there a way to mimic IF NOT EXISTS on UPDATE or is this just a
 bug?

 --

 Founder/CEO Spinn3r.com
 Location: *San Francisco, CA*
 blog: http://burtonator.wordpress.com
 … or check out my Google+ profile
 https://plus.google.com/102718274791889610666/posts
 http://spinn3r.com




Re: Force purging of tombstones

2014-11-17 Thread Robert Coli
On Mon, Nov 17, 2014 at 1:40 PM, Ken Hancock ken.hanc...@schange.com
wrote:

 You can use the JMX forceUserDefinedCompaction operation to compact each
 SSTable individually.

 https://github.com/hancockks/cassandra-compact-cf


I don't recall why I think this, but I think cleanup now also discards
expired tombstones, and is easier to use from nodetool than
UserDefinedCompaction.

=Rob


Re: Force purging of tombstones

2014-11-17 Thread Rahul Neelakantan
Doesn't repair also get rid of tombstones?

Rahul Neelakantan

 On Nov 17, 2014, at 5:53 PM, Robert Coli rc...@eventbrite.com wrote:
 
 On Mon, Nov 17, 2014 at 1:40 PM, Ken Hancock ken.hanc...@schange.com wrote:
 You can use the JMX forceUserDefinedCompaction operation to compact each 
 SSTable individually.
 
 https://github.com/hancockks/cassandra-compact-cf
 
 I don't recall why I think this, but I think cleanup now also discards 
 expired tombstones, and is easier to use from nodetool than 
 UserDefinedCompaction.
 
 =Rob
  


Re: IF NOT EXISTS on UPDATE statements?

2014-11-17 Thread Kevin Burton
 you can still do IF on UPDATE though… but it’s not possible to do IF
 mycolumn IS NULL -- If mycolumn = null should work


Alas.. it doesn’t :-/

-- 

Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
blog: http://burtonator.wordpress.com
… or check out my Google+ profile
https://plus.google.com/102718274791889610666/posts
http://spinn3r.com


Re: IF NOT EXISTS on UPDATE statements?

2014-11-17 Thread DuyHai Doan
Just tested with C* 2.1.1

cqlsh:test CREATE TABLE simple(id int PRIMARY KEY, val text);
cqlsh:test INSERT INTO simple (id) VALUES (1);
cqlsh:test SELECT * FROM simple ;

 id | val
+--
  1 | null

(1 rows)

cqlsh:test UPDATE simple SET val = 'new val' WHERE id=1 *IF val = null*;

 [applied]
---
  True

cqlsh:test SELECT * FROM simple ;

 id | val
+-
  1 | new val

(1 rows)

On Tue, Nov 18, 2014 at 12:12 AM, Kevin Burton bur...@spinn3r.com wrote:


 you can still do IF on UPDATE though… but it’s not possible to do IF
 mycolumn IS NULL -- If mycolumn = null should work


 Alas.. it doesn’t :-/

 --

 Founder/CEO Spinn3r.com
 Location: *San Francisco, CA*
 blog: http://burtonator.wordpress.com
 … or check out my Google+ profile
 https://plus.google.com/102718274791889610666/posts
 http://spinn3r.com




Re: Force purging of tombstones

2014-11-17 Thread Robert Coli
On Mon, Nov 17, 2014 at 3:10 PM, Rahul Neelakantan ra...@rahul.be wrote:

 Doesn't repair also get rid of tombstones?


Repair is a non-destructive activity, and therefore cannot purge tombstones.

=Rob


Re: IF NOT EXISTS on UPDATE statements?

2014-11-17 Thread Kevin Burton
Oh yes.  That will work because a value is already there. I’m talking if
the value does not exist. Otherwise I’d have to insert a null first.

On Mon, Nov 17, 2014 at 3:30 PM, DuyHai Doan doanduy...@gmail.com wrote:

 Just tested with C* 2.1.1

 cqlsh:test CREATE TABLE simple(id int PRIMARY KEY, val text);
 cqlsh:test INSERT INTO simple (id) VALUES (1);
 cqlsh:test SELECT * FROM simple ;

  id | val
 +--
   1 | null

 (1 rows)

 cqlsh:test UPDATE simple SET val = 'new val' WHERE id=1 *IF val = null*;

  [applied]
 ---
   True

 cqlsh:test SELECT * FROM simple ;

  id | val
 +-
   1 | new val

 (1 rows)

 On Tue, Nov 18, 2014 at 12:12 AM, Kevin Burton bur...@spinn3r.com wrote:


 you can still do IF on UPDATE though… but it’s not possible to do IF
 mycolumn IS NULL -- If mycolumn = null should work


 Alas.. it doesn’t :-/

 --

 Founder/CEO Spinn3r.com
 Location: *San Francisco, CA*
 blog: http://burtonator.wordpress.com
 … or check out my Google+ profile
 https://plus.google.com/102718274791889610666/posts
 http://spinn3r.com





-- 

Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
blog: http://burtonator.wordpress.com
… or check out my Google+ profile
https://plus.google.com/102718274791889610666/posts
http://spinn3r.com


Re: Repair completes successfully but data is still inconsistent

2014-11-17 Thread Michael Shuler

On 11/17/2014 05:22 AM, André Cruz wrote:

I have checked the logs of the 3 replicas for that period and nothing
really jumps out. Still, repairs have been running daily, the log
reports that the CF is synced, and as of this moment one of the
replicas still returns the zombie column so they don’t agree on the
data.


I'm at a bit of a loss. Readers, is `nodetool scrub` going to be helpful 
here, or any other suggestions of things to look for?


André, does `nodetool gossipinfo` show all the nodes in schema agreement?

--
Michael


Re: Deduplicating data on a node (RF=1)

2014-11-17 Thread Michael Shuler

On 11/17/2014 02:04 PM, Alain Vandendorpe wrote:

Hey all,

For legacy reasons we're living with Cassandra 2.0.10 in an RF=1 setup.
This is being moved away from ASAP. In the meantime, adding a node
recently encountered a Stream Failed error (http://pastie.org/9725846).
Cassandra restarted and it seemingly restarted streaming from zero,
without having removed the failed stream's data.

With bootstrapping and initial compactions finished that node now has
what seems to be duplicate data, with almost exactly 2x the expected
disk usage. CQL returns correct results but we depend on the ability to
directly read the SSTable files (hence also RF=1.)

Would anyone have suggestions on a good way to resolve this?


Start over fresh, deleting *all* the data, and bootstrap the node again?

--
Michael



Trying to build Cassandra for FreeBSD 10.1

2014-11-17 Thread William Arbaugh
I've successfully built 2.1.2 for FreeBSD, but the JVM crashes upon start-up.

Here's the snippet from the top of the log file (attached)

#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x000802422655, pid=76732, tid=34384931840
#

Any hints on how to get 2.1.2 working on FreeBSD is appreciated.



hs_err_pid76732.log
Description: Binary data


Re: Deduplicating data on a node (RF=1)

2014-11-17 Thread Jonathan Haddad
If he deletes all the data with RF=1, won't he have data loss?

On Mon Nov 17 2014 at 5:14:23 PM Michael Shuler mich...@pbandjelly.org
wrote:

 On 11/17/2014 02:04 PM, Alain Vandendorpe wrote:
  Hey all,
 
  For legacy reasons we're living with Cassandra 2.0.10 in an RF=1 setup.
  This is being moved away from ASAP. In the meantime, adding a node
  recently encountered a Stream Failed error (http://pastie.org/9725846).
  Cassandra restarted and it seemingly restarted streaming from zero,
  without having removed the failed stream's data.
 
  With bootstrapping and initial compactions finished that node now has
  what seems to be duplicate data, with almost exactly 2x the expected
  disk usage. CQL returns correct results but we depend on the ability to
  directly read the SSTable files (hence also RF=1.)
 
  Would anyone have suggestions on a good way to resolve this?

 Start over fresh, deleting *all* the data, and bootstrap the node again?

 --
 Michael




Re: Trying to build Cassandra for FreeBSD 10.1

2014-11-17 Thread Michael Shuler

On 11/17/2014 07:19 PM, William Arbaugh wrote:

I've successfully built 2.1.2 for FreeBSD, but the JVM crashes upon start-up.

Here's the snippet from the top of the log file (attached)

#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x000802422655, pid=76732, tid=34384931840
#

Any hints on how to get 2.1.2 working on FreeBSD is appreciated.



Not a very helpful hint, other than the fact you are not alone  :)

https://issues.apache.org/jira/browse/CASSANDRA-8325

--
Michael


Re: Deduplicating data on a node (RF=1)

2014-11-17 Thread Michael Shuler

On 11/17/2014 07:20 PM, Jonathan Haddad wrote:

If he deletes all the data with RF=1, won't he have data loss?


Of course, ignore my quick answer, Alain.

--
Michael



Re: Trying to build Cassandra for FreeBSD 10.1

2014-11-17 Thread graham sanderson
Only thing I can see from looking at the exception, is that it looks like - I 
didn’t disassemble the code from hex - that the “peer” value in the 
RefCountedMemory object is probably 0

Given that Unsafe.allocateMemory should not return 0 even on allocation failure 
(which should throw OOM) - though you should add a log statement to the Memory 
class to check that - I’d suggest logging to see if anyone is calling 
SSTableReader.releaseSummary, which could set the peer to 0

 On Nov 17, 2014, at 7:30 PM, Michael Shuler mich...@pbandjelly.org wrote:
 
 On 11/17/2014 07:19 PM, William Arbaugh wrote:
 I've successfully built 2.1.2 for FreeBSD, but the JVM crashes upon start-up.
 
 Here's the snippet from the top of the log file (attached)
 
 #
 # A fatal error has been detected by the Java Runtime Environment:
 #
 #  SIGSEGV (0xb) at pc=0x000802422655, pid=76732, tid=34384931840
 #
 
 Any hints on how to get 2.1.2 working on FreeBSD is appreciated.
 
 
 Not a very helpful hint, other than the fact you are not alone  :)
 
 https://issues.apache.org/jira/browse/CASSANDRA-8325
 
 -- 
 Michael



smime.p7s
Description: S/MIME cryptographic signature


Re: Deduplicating data on a node (RF=1)

2014-11-17 Thread Eric Stevens
If the new node never formally joined the cluster (streaming never
completed, it never entered UN state), shouldn't that node be safe to scrub
and start over again?  It shouldn't be taking primary writes while it's
bootstrapping, should it?

On Mon Nov 17 2014 at 6:34:04 PM Michael Shuler mich...@pbandjelly.org
wrote:

 On 11/17/2014 07:20 PM, Jonathan Haddad wrote:
  If he deletes all the data with RF=1, won't he have data loss?

 Of course, ignore my quick answer, Alain.

 --
 Michael