Re: Multi-DC Deployment

2011-04-19 Thread Terje Marthinussen
Hum... Seems like it could be an idea in a case like this with a mode where result is always returned (if possible), but where a flay saying if the consistency level was met, or to what level it was met (number of nodes answering for instance).? Terje On Tue, Apr 19, 2011 at 1:13 AM, Jonathan

Re: Multi-DC Deployment

2011-04-19 Thread Adrian Cockcroft
If you want to use local quorum for a distributed setup, it doesn't make sense to have less than RF=3 local and remote. Three copies at both ends will give you high availability. Only one copy of the data is sent over the wide area link (with recent versions). There is no need to use mirrored or

Re: How to warm up a cold node

2011-04-19 Thread Héctor Izquierdo Seliva
Shouldn't the dynamic snitch take into account response times and ask a slow node for less requests? It seems that at node startup, only a handfull of requests arrive to the node and it keeps up well, but there's moment where there's more than it can handle with a cold cache and starts droping

CQL transport (was: CQL DELETE statement)

2011-04-19 Thread Ted Zlatanov
On Tue, 19 Apr 2011 00:21:44 +0100 Courtney Robinson sa...@live.co.uk wrote: CR Cool... Okay, the plan is to eventually not use thrift underneath, CR for the CQL stuff right? Once this is done and the new transport is CR in place, or evening while designing the new transport, is this not CR

AW: AW: Two versions of schema

2011-04-19 Thread Roland Gude
Yeah it happens from time to time even if everything seems to be fine that schema changes don't work correctly. But it's always repairable with the described procedure. Therefore the operator being available is a must have I think. Drain is a nodetool command. The node flushes data and stops

Tombstones and memtable_operations

2011-04-19 Thread Héctor Izquierdo Seliva
Hi everyone. I've configured in one of my column families memtable_operations = 0.02 and started deleting keys. I have already deleted 54k, but there hasn't been any flush of the memtable. Memory keeps pilling up and eventually nodes start to do stop-the-world GCs. Is this the way this is supposed

Re: Tombstones and memtable_operations

2011-04-19 Thread Héctor Izquierdo Seliva
Ok, I've read about gc grace seconds, but i'm not sure I understand it fully. Untill gc grace seconds have passed, and there is a compaction, the tombstones live in memory? I have to delete 100 million rows and my insert rate is very low, so I don't have a lot of compactions. What should I do in

Re: AW: AW: Two versions of schema

2011-04-19 Thread mcasandra
What would be the procedure in this case? Run drain on the node that is disagreeing? But is it enough to run just drain or you suggest drain + rm system files? -- View this message in context:

Re: Problems with subcolumn retrieval after upgrade from 0.6 to 0.7

2011-04-19 Thread Abraham Sanderson
Ok, set up a unit test for the supercolumns which seem to have problems, I posted a few examples below. As I mentioned, the retrieved bytes for the name and value appear to have additional data; in previous tests the buffer's position, mark, and limit have been verified, and when I call

Re: How to warm up a cold node

2011-04-19 Thread aaron morton
The dynamic snitch only reduces the chance that a node used in a read operation, it depends on the RF, the CL for the operation, the partitioner and possibly the network topology. Dropping read messages is ok, so long as your operation completes at the requested CL. Are you using either a

Re: Tombstones and memtable_operations

2011-04-19 Thread aaron morton
I think their may be an issue here, we are counting the number of columns in the operation. When deleting an entire row we do not have a column count. Can you let us know what version you are using and how you are doing the delete ? Thanks Aaron On 20 Apr 2011, at 04:21, Héctor Izquierdo

Re: How to warm up a cold node

2011-04-19 Thread Héctor Izquierdo Seliva
El mié, 20-04-2011 a las 07:59 +1200, aaron morton escribió: The dynamic snitch only reduces the chance that a node used in a read operation, it depends on the RF, the CL for the operation, the partitioner and possibly the network topology. Dropping read messages is ok, so long as your

Re: Tombstones and memtable_operations

2011-04-19 Thread Héctor Izquierdo Seliva
El mié, 20-04-2011 a las 08:16 +1200, aaron morton escribió: I think their may be an issue here, we are counting the number of columns in the operation. When deleting an entire row we do not have a column count. Can you let us know what version you are using and how you are doing the

Re: Tombstones and memtable_operations

2011-04-19 Thread shimi
You can use memtable_flush_after_mins instead of the cron Shimi 2011/4/19 Héctor Izquierdo Seliva izquie...@strands.com El mié, 20-04-2011 a las 08:16 +1200, aaron morton escribió: I think their may be an issue here, we are counting the number of columns in the operation. When deleting an

Re: Tombstones and memtable_operations

2011-04-19 Thread Héctor Izquierdo Seliva
El mar, 19-04-2011 a las 23:33 +0300, shimi escribió: You can use memtable_flush_after_mins instead of the cron Shimi Good point! I'll try that. Wouldn't it be better to count a delete as a one column operation so it contributes to flush by operations? 2011/4/19 Héctor Izquierdo Seliva

Re: Problems with subcolumn retrieval after upgrade from 0.6 to 0.7

2011-04-19 Thread aaron morton
Can you provide a little more info on what I'm seeing here. When name is shown for the column, are you showing me the entire byte buffer for the name or just up to limit ? Aaron On 20 Apr 2011, at 05:49, Abraham Sanderson wrote: Ok, set up a unit test for the supercolumns which seem to have

Re: Tombstones and memtable_operations

2011-04-19 Thread aaron morton
How do you do the deletes ? Aaron On 20 Apr 2011, at 08:39, Héctor Izquierdo Seliva wrote: El mar, 19-04-2011 a las 23:33 +0300, shimi escribió: You can use memtable_flush_after_mins instead of the cron Shimi Good point! I'll try that. Wouldn't it be better to count a delete as a

Re: Tombstones and memtable_operations

2011-04-19 Thread Héctor Izquierdo Seliva
I poste it a couple of messages back, but here it is again: I'm using 0.7.4. I have a file with all the row keys I have to delete (around 100 million) and I just go through the file and issue deletes through pelops. Should I manually issue flushes with a cron every x time?

Re: Problems with subcolumn retrieval after upgrade from 0.6 to 0.7

2011-04-19 Thread aaron morton
Can you show what comes back from calling Column.getName() Aaron On 20 Apr 2011, at 09:00, aaron morton wrote: Can you provide a little more info on what I'm seeing here. When name is shown for the column, are you showing me the entire byte buffer for the name or just up to limit ?

Re: Tombstones and memtable_operations

2011-04-19 Thread aaron morton
Yes, I saw that. Wanted to know what issue deletes through pelops means so I can work out what command it's sending to cassandra and hopefully I don't waste my time looking in the wrong place. Aaron On 20 Apr 2011, at 09:04, Héctor Izquierdo Seliva wrote: I poste it a couple of messages

pig + hadoop

2011-04-19 Thread pob
Hello, I did cluster configuration by http://wiki.apache.org/cassandra/HadoopSupport. When I run pig example-script.pig -x local, everything is fine and i get correct results. Problem is occurring with -x mapreduce Im getting those errors : 2011-04-20 01:24:21,791 [main] ERROR

Re: pycassa + celery

2011-04-19 Thread pob
Hello, yeah, the bug was in my code because i use CL.ONE (so sometimes i got incomplete data) Thanks. 2011/4/14 aaron morton aa...@thelastpickle.com This is going to be a bug in your code, so it's a bit tricky to know but... How / when is the email added to the DB? What does the rawEmail

Cassandra 0.7.4 and LOCAL_QUORUM Consistency level

2011-04-19 Thread Oleg Tsvinev
Earlier I've posted the same message to a hector-users list. Guys, I'm a bit puzzled today. I'm using just released Hector 0.7.0-29 (thank you, Nate!) and Cassandra 0.7.4 and getting the exception below, marked as (1) Exception. When I dig to Cassandra source code below, marked as (2) Cassandra

Re: Cassandra 0.7.4 and LOCAL_QUORUM Consistency level

2011-04-19 Thread William Oberman
I had a similar error today when I tried using LOCAL_QUORUM without having a properly configured NetworkTopologyStrategy. QUORUM worked fine however. will On Tue, Apr 19, 2011 at 8:52 PM, Oleg Tsvinev oleg.tsvi...@gmail.comwrote: Earlier I've posted the same message to a hector-users list.

Re: Cassandra 0.7.4 and LOCAL_QUORUM Consistency level

2011-04-19 Thread Oleg Tsvinev
I'm puzzled because code does not even check for LOCAL_QUORUM before throwing exception. Indeed I did not configure NetworkTopologyStrategy. Are you saying that it works after configuring it? On Tue, Apr 19, 2011 at 6:04 PM, William Oberman ober...@civicscience.com wrote: I had a similar error

Re: Tombstones and memtable_operations

2011-04-19 Thread aaron morton
Thats what I was looking for, thanks. At first glance the behaviour looks inconsistent, we count the number of columns in the delete mutation. But when deleting a row the column count is zero. I'll try to take a look later. In the mean time you can force a memtable via JConsole, navigate

Re: Cassandra 0.7.4 and LOCAL_QUORUM Consistency level

2011-04-19 Thread William Oberman
Good point, should have read your message (and the code) more closely! Sent from my iPhone On Apr 19, 2011, at 9:16 PM, Oleg Tsvinev oleg.tsvi...@gmail.com wrote: I'm puzzled because code does not even check for LOCAL_QUORUM before throwing exception. Indeed I did not configure

Re: pig + hadoop

2011-04-19 Thread aaron morton
Am guessing but here goes. Looks like the cassandra RPC port is not set, did you follow these steps in contrib/pig/README.txt Finally, set the following as environment variables (uppercase, underscored), or as Hadoop configuration variables (lowercase, dotted): * PIG_RPC_PORT or

Re: Cassandra 0.7.4 and LOCAL_QUORUM Consistency level

2011-04-19 Thread aaron morton
You need to be using NTS. When NetworkTopologySetting is used it overrides the AbstractReplicationStrategy.getWriteResponseHandler() function in your stack and returns a either a DataCentreWriteResponseHandler for LOCAL_QUORUM or DatacenterSyncWriteResponseHandler for EACH_QUORUM . They are

Re: pig + hadoop

2011-04-19 Thread pob
Hey Aaron, I read it, and all of 3 env variables was exported. The results are same. Best, P 2011/4/20 aaron morton aa...@thelastpickle.com Am guessing but here goes. Looks like the cassandra RPC port is not set, did you follow these steps in contrib/pig/README.txt Finally, set the

Re: pig + hadoop

2011-04-19 Thread pob
ad2. it works with -x local , so there cant be issue with pig-DB(Cassandra). im using pig-0.8 from official site + hadoop-0.20.2 from offic. site. thx 2011/4/20 aaron morton aa...@thelastpickle.com Am guessing but here goes. Looks like the cassandra RPC port is not set, did you follow

Re: Cassandra 0.7.4 and LOCAL_QUORUM Consistency level

2011-04-19 Thread Oleg Tsvinev
Ah, OK. Thank you Aaron, I'll try that. On Tue, Apr 19, 2011 at 6:39 PM, aaron morton aa...@thelastpickle.com wrote: You need to be using NTS. When NetworkTopologySetting is used it overrides the AbstractReplicationStrategy.getWriteResponseHandler() function in your stack and returns a

Re: Multi-DC Deployment

2011-04-19 Thread Terje Marthinussen
If you have RF=3 in both datacenters, it could be discussed if there is a point to use the built in replication in Cassandra at all vs. feeding the data to both datacenters and get 2 100% isolated cassandra instances that cannot replicate sstable corruptions between each others My point is

Re: pig + hadoop

2011-04-19 Thread pob
Thats from jobtracker: 2011-04-20 03:36:39,519 INFO org.apache.hadoop.mapred.JobInProgress: Choosing rack-local task task_201104200331_0002_m_00 2011-04-20 03:36:42,521 INFO org.apache.hadoop.mapred.TaskInProgress: Error from attempt_201104200331_0002_m_00_3:

Re: Cassandra 0.7.4 and LOCAL_QUORUM Consistency level

2011-04-19 Thread Jonathan Ellis
It doesn't make a lot of sense in general to allow those w/ non-NTS, but it should be possible (e.g. if you've manually interleaved nodes with ONTS so you know how many replicas are in each DC). Patch attached to https://issues.apache.org/jira/browse/CASSANDRA-2516 On Tue, Apr 19, 2011 at 8:39

Re: pig + hadoop

2011-04-19 Thread pob
and one more thing... 2011-04-20 04:09:23,412 INFO org.apache.hadoop.mapred.TaskTracker: org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find taskTracker/jobcache/job_201104200406_0001/attempt_201104200406_0001_m_02_0/output/file.out in any of the configured local directories

RE: pig + hadoop

2011-04-19 Thread Jeffrey Wang
Did you set PIG_RPC_PORT in your hadoop-env.sh? I was seeing this error for a while before I added that. -Jeffrey From: pob [mailto:peterob...@gmail.com] Sent: Tuesday, April 19, 2011 6:42 PM To: user@cassandra.apache.org Subject: Re: pig + hadoop Hey Aaron, I read it, and all of 3 env

Re: pig + hadoop

2011-04-19 Thread Jeremy Hanna
oh yeah - that's what's going on. what I do is on the machine that I run the pig script from, I set the PIG_CONF variable to my HADOOP_HOME/conf directory and in my mapred-site.xml file found there, I set the three variables. I don't use environment variables when I run against a cluster. On

Re: pig + hadoop

2011-04-19 Thread Jeremy Hanna
Just as an example: property namecassandra.thrift.address/name value10.12.34.56/value /property property namecassandra.thrift.port/name value9160/value /property property namecassandra.partitioner.class/name valueorg.apache.cassandra.dht.RandomPartitioner/value