Re: Cassandra token range support for Hadoop (ColumnFamilyInputFormat)

2014-05-20 Thread Aaron Morton
“between 1.2.6 and 2.0.6 the setInputRange(startToken, endToken) is not working” Can you confirm or disprove? My reading of the code is that it will consider the part of a token range (from vnodes or initial tokens) that overlap with the provided token range. I’ve already got one

Re: CQL 3 and wide rows

2014-05-20 Thread Aaron Morton
In a CQL 3 table the only **column** names are the ones defined in the table, in the example below there are three column names. CREATE TABLE keyspace.widerow ( row_key text, wide_row_column text, data_column text, PRIMARY KEY (row_key, wide_row_column)); Check out, for example,

Re: CQL 3 and wide rows

2014-05-20 Thread Jack Krupansky
To keep the terminology clear, your “row_key” is actually the “partition key”, and “wide_row_column” is actually a “clustering column”, and the combination of your row_key and wide_row_column is a “compound primary key”. -- Jack Krupansky From: Aaron Morton Sent: Tuesday, May 20, 2014 3:06 AM

Re: CQL 3 and wide rows

2014-05-20 Thread Maciej Miklas
yes :) On 20 May 2014, at 14:24, Jack Krupansky j...@basetechnology.com wrote: To keep the terminology clear, your “row_key” is actually the “partition key”, and “wide_row_column” is actually a “clustering column”, and the combination of your row_key and wide_row_column is a “compound

Re: CQL 3 and wide rows

2014-05-20 Thread Maciej Miklas
Hi Aron, Thanks for the answer! Lest consider such CLI code: for(int i = 0 ; i 10_000_000 ; i++) { set[‘rowKey1’][‘myCol::i’] = UUID.randomUUID(); } The code above will create single row, that contains 10^6 columns sorted by ‘i’. This will work fine, and this is the wide row to my

Disable FS journaling

2014-05-20 Thread Paulo Ricardo Motta Gomes
Hello, Has anyone disabled file system journaling on Cassandra nodes? Does it make any difference on write performance? Cheers, -- *Paulo Motta* Chaordic | *Platform* *www.chaordic.com.br http://www.chaordic.com.br/* +55 48 3232.3200

Re: Disable FS journaling

2014-05-20 Thread Samir Faci
I'm not sure you'd be gaining much by doing this. This is probably dependent on the file system you're referring to when you say journaling. There's a few of them around, You could opt to use ext2 instead of ext3/4 in the unix world. A quick google search linked me to this:

Re: CQL 3 and wide rows

2014-05-20 Thread Nate McCall
Something like this might work: cqlsh:my_keyspace CREATE TABLE my_widerow ( ... id text, ... my_col timeuuid, ... PRIMARY KEY (id, my_col) ... ) WITH caching='KEYS_ONLY' AND ... compaction={'class':

Re: Disable FS journaling

2014-05-20 Thread Michael Shuler
On 05/20/2014 09:54 AM, Samir Faci wrote: I'm not sure you'd be gaining much by doing this. This is probably dependent on the file system you're referring to when you say journaling. There's a few of them around, You could opt to use ext2 instead of ext3/4 in the unix world. A quick google

Re: Disable FS journaling

2014-05-20 Thread Paulo Ricardo Motta Gomes
Thanks for the links! Forgot to mention, using XFS here, as suggested by the Cassandra wiki. But just double checked and it's apparently not possible to disable journaling on XFS. One of ours sysadmin just suggested disabling journaling, since it's mostly for recovery purposes, and Cassandra

Re: CQL 3 and wide rows

2014-05-20 Thread Maciej Miklas
Thank you Nate - now I understand it ! This is real improvement when compared to CLI :) Regards, Maciej On 20 May 2014, at 17:16, Nate McCall n...@thelastpickle.com wrote: Something like this might work: cqlsh:my_keyspace CREATE TABLE my_widerow ( ... id text,

Re: Disable FS journaling

2014-05-20 Thread Terje Marthinussen
Journal enabled is faster on almost all operations. Recovery here is more about saving you from waiting 1/2 hour from a traditional full file system check. Feel free to wait if you want though! :) Regards, Terje On 21 May 2014, at 01:11, Paulo Ricardo Motta Gomes

Re: Disable FS journaling

2014-05-20 Thread Paulo Ricardo Motta Gomes
On Tue, May 20, 2014 at 1:24 PM, Terje Marthinussen tmarthinus...@gmail.com wrote: Journal enabled is faster on almost all operations. Good to know, thanks! Recovery here is more about saving you from waiting 1/2 hour from a traditional full file system check. On an EC2 environment you

Re: Disable FS journaling

2014-05-20 Thread Kevin Burton
My gut says you won't see much of a performance boost. Especially if you're on SSD as the journal isn't going to be hindered by random write speed. Also, I *believe* you will lose filesystem metadata too… which Cassandra doesn't protect you from. On Tue, May 20, 2014 at 9:30 AM, Paulo Ricardo

Re: Best partition type for Cassandra with JBOD

2014-05-20 Thread Kevin Burton
This has not been my experience… In my benchmarks over the years noatime has mattered. However, I might have not been as scientifically motivated to falsify the noatime hypothesis… specifically, I might just have accidentally used confirmation bias and assumed that noatime mattered and then moved

Re: Ec2 Network I/O

2014-05-20 Thread Ben Bromhead
Also once you've got your phi_convict_threshold sorted, if you see these again check: http://status.aws.amazon.com/ AWS does occasionally have the odd increased latency issue / outage. Ben Bromhead Instaclustr | www.instaclustr.com | @instaclustr | +61 415 936 359 On 19/05/2014, at 1:15

CassandraStorage loader generating 2x many record?

2014-05-20 Thread Kevin Burton
This has to be a bug or either that or I'm insane. Here's my table in Cassandra: CREATE TABLE test_source ( id int , primary key(id) ); INSERT INTO test_source (ID) VALUES(1); INSERT INTO test_source (ID) VALUES(2); INSERT INTO test_source (ID) VALUES(3); INSERT INTO test_source (ID)

Is the tarball for a given release in a Maven repository somewhere?

2014-05-20 Thread Clint Kelly
Hi all, I am using the maven assembly plugin to build a project that contains a development environment for a project that we've built at work on top of Cassandra. I'd like this development environment to include the latest release of Cassandra. Is there a maven repo anywhere that contains an

RE: Cassandra token range support for Hadoop (ColumnFamilyInputFormat)

2014-05-20 Thread Anton Brazhnyk
I went with recommendations to create my own input format or backport the 2.0.7 code and it works now. To be more specific... AbstractColumnFamilyInputFormat. getSplits(JobContext) handled just the case with ordered partitioner and ranges based on keys. It did converted keys to tokens and used

Memory issue

2014-05-20 Thread opensaf dev
Hi guys, I am trying to run Cassandra on CentOS as an user X other then root or cassandra. When I run as user cassandra, it starts and runs fine. But, when I run under user X, I am getting the below error once cassandra started and system freezes totally. *Insufficient memlock settings:* WARN

RE: Memory issue

2014-05-20 Thread Romain HARDOUIN
Hi, You have to define limits for the user. Here is an example for the user cassandra: # cat /etc/security/limits.d/cassandra.conf cassandra - memlock unlimited cassandra - nofile 10 best, Romain opensaf dev opensaf...@gmail.com a écrit sur 21/05/2014 06:59:05 : De :

RE: Memory issue

2014-05-20 Thread Romain HARDOUIN
Well... you have already changed the limits ;-) Keep in mind that changes in the limits.conf file will not affect processes that are already running. opensaf dev opensaf...@gmail.com a écrit sur 21/05/2014 06:59:05 : De : opensaf dev opensaf...@gmail.com A : user@cassandra.apache.org, Date