Re: newer Cassandra + Hadoop = TimedOutException()

2012-03-06 Thread Jeremy Hanna
you may be running into this - https://issues.apache.org/jira/browse/CASSANDRA-3942 - I'm not sure if it really affects the execution of the job itself though. On Mar 6, 2012, at 2:32 AM, Patrik Modesto wrote: Hi, I was recently trying Hadoop job + cassandra-all 0.8.10 again and the

Re: Issue with nodetool clearsnapshot

2012-03-06 Thread aaron morton
1)Since you mentioned hard links, I would like to add that our data directory itself is a sym-link. Could that be causing an issue ? Seems unlikely. I restarted the node and it went about deleting the files and the disk space has been released. Can this be done using nodetool, and without

Old data coming alive after adding node

2012-03-06 Thread Stefan Reek
Hi, We were running a 3-node cluster of cassandra 0.6.13 with RF=3. After we added a fourth node, keeping RF=3, some old data appeared in the database. As far as I understand this can only happen if nodetool repair wasn't run for more than GCGraceSeconds. Our GCGraceSeconds is set to the

Re: Secondary indexes don't go away after metadata change

2012-03-06 Thread aaron morton
When the new node comes online the history of schema changes are streamed to it. I've not looked at the code but it could be that schema migrations are creating Indexes. That are then deleted from the schema but not from the DB it's self. Does that fit your scenario ? When the new node comes

Re: running two rings on the same subnet

2012-03-06 Thread Tamar Fraenkel
I have some more info, after couple of hours running the problematic node became again 100% CPU and I had to reboot it, last lines from log show it did GC: INFO [ScheduledTasks:1] 2012-03-06 10:28:00,880 GCInspector.java (line 122) GC for Copy: 203 ms for 1 collections, 185983456 used; max is

Re: Old data coming alive after adding node

2012-03-06 Thread aaron morton
After we added a fourth node, keeping RF=3, some old data appeared in the database. What CL are you working at ? (Should not matter too much with repair working, just asking) We don't run compact on the nodes explicitly as I understand that running repair will trigger a major compaction.

Re: running two rings on the same subnet

2012-03-06 Thread aaron morton
You do not have enough memory allocated to the JVM and are suffering from excessive GC as a result. There are some tuning things you can try, but 480MB is not enough. 1GB would be a better start, 2 better than that. Consider using https://github.com/pcmanus/ccm for testing multiple instances

Re: running two rings on the same subnet

2012-03-06 Thread Tamar Fraenkel
Arron, Thanks for your response. I was afraid this is the issue. Can you give me some direction regarding the fine tuning of my VMs, I would like to explore that option some more. Thanks! *Tamar Fraenkel * Senior Software Engineer, TOK Media [image: Inline image 1] ta...@tok-media.com Tel:

Re: Old data coming alive after adding node

2012-03-06 Thread Stefan Reek
Hi Aaron, Thanks for the quick reply. All our writes/deletes are done with CL.QUORUM. Our reads are done with CL.ONE. Although the reads that confirmed the old data were done with CL.QUORUM. According to https://svn.apache.org/viewvc/cassandra/branches/cassandra-0.6/CHANGES.txt 0.6.6 has the

Re: newer Cassandra + Hadoop = TimedOutException()

2012-03-06 Thread Patrik Modesto
Hi Florent, I don't change the server version, it is the Cassandra 0.8.10. I change just the version of cassandra-all in pom.xml of the mapreduce job. I have the 'rpc_address: 0.0.0.0' in cassandra.yaml, because I want cassandra to bind RPC to all interfaces. Regards, P. On Tue, Mar 6, 2012

RE: Mutation Dropped Messages

2012-03-06 Thread Tiwari, Dushyant
1. One node is running at 8G rest on 10G - same config 2. Nodetool - Status State LoadOwnsToken 162563731948587347959549934419333022646 Up Normal 107.79 MB 25.00%

Re: newer Cassandra + Hadoop = TimedOutException()

2012-03-06 Thread Florent Lefillâtre
Excuse me, I had not understood. So, for me, the problem comes from the change of ColumnFamilyInputFormat class between 0.8.7 and 0.8.10 where the splits are created (0.8.7 uses endpoints and 0.8.10 uses rpc_endpoints). With your config, splits fails, so Hadoop doesn't run a Map task on

Re: newer Cassandra + Hadoop = TimedOutException()

2012-03-06 Thread Florent Lefillâtre
I remember a bug on the ColumnFamilyInputFormat class 0.8.10. It was a test rpc_endpoints == 0.0.0.0 in place of rpc_endpoint.equals(0.0.0.0), may be it can help you Le 6 mars 2012 12:18, Florent Lefillâtre flefi...@gmail.com a écrit : Excuse me, I had not understood. So, for me, the problem

Re: newer Cassandra + Hadoop = TimedOutException()

2012-03-06 Thread Patrik Modesto
I've tryied cassandra-all 0.8.10 with fixed the rpc_endpoints == 0.0.0.0 bug, but the result is the same, there are still tasks over 1000%. The only change is that there are real host names instead of 0.0.0.0 in the debug output. Reconfiguring whole cluster is not possible, I can't test the

Re: newer Cassandra + Hadoop = TimedOutException()

2012-03-06 Thread Patrik Modesto
I've added a debug message in the CFRR.getProgress() and I can't find it in the debug output. Seems like the getProgress() has not been called at all; Regards, P. On Tue, Mar 6, 2012 at 09:49, Jeremy Hanna jeremy.hanna1...@gmail.com wrote: you may be running into this -

Truncate flushes memtables for all CFs causing timeouts

2012-03-06 Thread Viktor Jevdokimov
Hello, Truncate uses RPC timeout, which is in my case set to 10 seconds (I want even less) and it's not enough. I've seen in sources TODO for this case. What I found is that truncate starting flush for all memtables for all CFs, not only for a CF to be truncated. When there're a lot of CFs to be

Repairing nodes when two schema versions appear

2012-03-06 Thread Tharindu Mathew
Hi, I try to add column families programatically and end up with 2 schema versions in the Cassandra cluster. Using Cassandra 0.7. Is there a way to bring this back to normal (to one schema version) through the cli or through the API? -- Regards, Tharindu blog: http://mackiemathew.com/

Re: newer Cassandra + Hadoop = TimedOutException()

2012-03-06 Thread Florent Lefillâtre
CFRR.getProgress() is called by child mapper tasks on each TastTracker node, so the log must appear on ${hadoop_log_dir}/attempt_201202081707_0001_m_00_0/syslog (or somethings like this) on TaskTrackers, not on client job logs. Are you sure to see the good log file, I say that because in your

Re: running two rings on the same subnet

2012-03-06 Thread aaron morton
Reduce these settings for the CF row_cache (disable it) key_cache (disable it) Increase these settings for the CF bloom_filter_fp_chance Reduce these settings in cassandra.yaml flush_largest_memtables_at memtable_flush_queue_size sliced_buffer_size_in_kb in_memory_compaction_limit_in_mb

Re: Schema change causes exception when adding data

2012-03-06 Thread Jeremiah Jordan
That is the best one I have found. On 03/01/2012 03:12 PM, Tharindu Mathew wrote: There are 2. I'd like to wait till there are one, when I insert the value. Going through the code, calling client.describe_schema_versions() seems to give a good answer to this. And I discovered that if I wait

Re: Schema change causes exception when adding data

2012-03-06 Thread Tamar Fraenkel
Hi! Maybe I didn't understand, but if you use Hector's addColumnFamily(CF, true); it should wait for schema agreement. Will that solve your problem? Thanks *Tamar Fraenkel * Senior Software Engineer, TOK Media [image: Inline image 1] ta...@tok-media.com Tel: +972 2 6409736 Mob: +972 54

Re: Old data coming alive after adding node

2012-03-06 Thread aaron morton
All our writes/deletes are done with CL.QUORUM. Our reads are done with CL.ONE. Although the reads that confirmed the old data were done with CL.QUORUM. According to https://svn.apache.org/viewvc/cassandra/branches/cassandra-0.6/CHANGES.txt 0.6.6 has the same patch for

Re: Mutation Dropped Messages

2012-03-06 Thread aaron morton
1. One node is running at 8G rest on 10G – same config Make them all the same. 2. Nodetool – Even though the token ranges are not balanced, the load looks a little odd. Have you moved tokens ? Did you do a cleanup ? You'll need to look at the node that is dropping messages (not

Re: Truncate flushes memtables for all CFs causing timeouts

2012-03-06 Thread aaron morton
Truncate uses RPC timeout, which is in my case set to 10 seconds (I want even less) and it's not enough. I've seen in sources TODO for this case. created https://issues.apache.org/jira/browse/CASSANDRA-4006 Is it possible to flush only required CF for truncate, not all? This could improve

Re: Repairing nodes when two schema versions appear

2012-03-06 Thread aaron morton
Go to one of the nodes, stop it and delete the Migrations and Schema files in the system keyspace. When you restart the node it will stream the migrations the other. Note that if the node is UP and accepting traffic it may log errors about missing CF's during this time. Cheers

RE: Secondary indexes don't go away after metadata change

2012-03-06 Thread Frisch, Michael
Sure enough it does. Looking back in the logs when the node was first coming online I can see it applying migrations and submitting index builds on indexes that are deleted in the newest version of the schema. This may be a silly question but shouldn't it just apply the most recent version of

Re: Issue with nodetool clearsnapshot

2012-03-06 Thread B R
Thanks a lot, Aaron. Our cluster is much stable now. We'll look at upgrading to 1.x in the coming weeks. On Tue, Mar 6, 2012 at 2:33 PM, aaron morton aa...@thelastpickle.comwrote: 1)Since you mentioned hard links, I would like to add that our data directory itself is a sym-link. Could that be

key sorting question

2012-03-06 Thread Tamar Fraenkel
Hi! I am currently experimenting with Cassandra 1.0.7, but while reading http://www.datastax.com/dev/blog/schema-in-cassandra-1-1 something caught my eye: Cassandra orders version 1 UUIDshttp://en.wikipedia.org/wiki/Universally_unique_identifier#Version_1_.28MAC_address.29 by their time component

Re: key sorting question

2012-03-06 Thread Dave Brosius
With random partitioner, the rows are sorted by the hashes of the keys, so for all intents and purposes, not sorted. This comment below really is talking about how columns are sorted, and yes when time uuids are used, they are sorted by the time component, as a time

Re: key sorting question

2012-03-06 Thread Tamar Fraenkel
Thanks. *Tamar Fraenkel * Senior Software Engineer, TOK Media [image: Inline image 1] ta...@tok-media.com Tel: +972 2 6409736 Mob: +972 54 8356490 Fax: +972 2 5612956 On Wed, Mar 7, 2012 at 8:55 AM, Dave Brosius dbros...@mebigfatguy.comwrote: With random partitioner, the rows are

Re: Truncate flushes memtables for all CFs causing timeouts

2012-03-06 Thread Viktor Jevdokimov
Thank you. To sum up, to free up and discard a commit log - flush all. So higher timeout for truncate will/should work. 2012/3/6 aaron morton aa...@thelastpickle.com Truncate uses RPC timeout, which is in my case set to 10 seconds (I want even less) and it's not enough. I've seen in sources