date:20121018

You can double check the node reporting 9.109 as down can telnet to port 7000 
on 9.109. 

Then I would restart 9.109 with -Dcassandra.load_ring_state=false added as a 
JVM param in cassandra-env.sh. 

If is still shows as down can you post the output from nodetool gossipinfo from 
9.109 and the node that sees 9.109 as down. 

Cheers


-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 18/10/2012, at 8:45 PM, Rene Kochen rene.koc...@schange.com wrote:

 I have a four node EC2 cluster.
 
 Three machines show via nodetool ring that all machines are UP.
 One machine shows via nodetool ring that one machine is DOWN.
 
 If I take a closer to the machine reporting the other machine as down, I see 
 the following:
 
 - StorageService.UnreachableNodes = 10.49.9.109
 - FailureDetector.SimpleStates: 10.49.9.109 = UP
 
 So gossip is fine. Actually the whole 10.49.9.109 machine is fine. I see in 
 the logging that there is communication between 10.49.9.109 and the machine 
 reporting it as down.
 
 How or when is a node removed from the UnreachableNodes list and reported as 
 UP again via nodetool ring?
 
 I use Cassandra 1.0.11
 
 Thanks!
 
 Rene

Re: Cassandra nodes loaded unequally

At times of high load check the CPU % for the java service running C* to 
confirm C* is the source of load. 

If the load is generated from C* check the logs (or use OpsCentre / other 
monitoring) to see if it correlated to compaction, or Garbage Collection or 
repair or high throughput. 

Cheers

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 17/10/2012, at 12:22 PM, Ben Kaehne ben.kae...@sirca.org.au wrote:

 Nothing unusual.
 
 All servers are exactly the same. Nothing unusual in the log files. Is there 
 any level of logging that I should be turning on?
 
 Regards,
 
 On Wed, Oct 17, 2012 at 9:51 AM, Andrey Ilinykh ailin...@gmail.com wrote:
 With your environment (3 nodes, RF=3) it is very difficult to get
 uneven load. Each node receives the same number of read/write
 requests. Probably something is wrong on low level, OS or VM. Do you
 see anything unusual in log files?
 
 Andrey
 
 On Tue, Oct 16, 2012 at 3:40 PM, Ben Kaehne ben.kae...@sirca.org.au wrote:
  Not connecting to the same node every time. Using Hector to ensure an even
  distribution of connections accross the cluster.
 
  Regards,
 
  On Sat, Oct 13, 2012 at 4:15 AM, B. Todd Burruss bto...@gmail.com wrote:
 
  are you connecting to the same node every time?  if so, spread out
  your connections across the ring
 
  On Fri, Oct 12, 2012 at 1:22 AM, Alexey Zotov azo...@griddynamics.com
  wrote:
   Hi Ben,
  
   I suggest you to compare amount of queries for each node. May be the
   problem
   is on the client side.
   Yoy can do that using JMX:
   org.apache.cassandra.db:type=ColumnFamilies,keyspace=YOUR
   KEYSPACE,columnfamily=YOUR CF,ReadCount
   org.apache.cassandra.db:type=ColumnFamilies,keyspace=YOUR
   KEYSPACE,columnfamily=YOUR CF,WriteCount
  
   Also I suggest to check output of nodetool compactionstats.
  
   --
   Alexey
  
  
 
 
 
 
  --
  -Ben
 
 
 
 -- 
 -Ben

Re: run repair on each node or every R nodes?

Without -pr the repair works on all token ranges the node is a replica for. 

With -pr it  only repairs data in the token range it is assigned. In your case 
when you ran it on node 0 with RF the token range form node 0 was repaired on 
nodes 0, 1 and 2. The other token ranges on nodes 0, 1 and 2 were not repaired. 

Cheers


-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 18/10/2012, at 5:15 AM, Andrey Ilinykh ailin...@gmail.com wrote:

 
 In my mind it does make sense, and what you're saying is correct. But I read
 that it was better to run repair in each node with a -pr option.
 
 Alain
 
 Yes, it's correct. Running repair -pr on each node you repair whole
 cluster without job duplication.
 
 Andrey

Re: Missing non composite column

 Yes, i understand that. Reason why i am asking is, with this i need to split
 them to get actual column name using : as a seperator.
The : is a artefact of the cassandra-cli, nothing something you will have to 
deal with via the thrift API. Internally we do not store the values with : 
separators. 

Any idiomatic API will take care of parsing the raw wire format, see the 
pycassa example here...

http://pycassa.github.com/pycassa/assorted/composite_types.html

Cheers

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 18/10/2012, at 2:58 AM, Sylvain Lebresne sylv...@datastax.com wrote:

 Yes, i understand that. Reason why i am asking is, with this i need to split
 them to get actual column name using : as a seperator.
 Though i did not try it yet, but wondering if column name is like
 alliance:movement, then how do it compute it?
 
 You've lost me, sorry.
 
 --
 Sylvain
 
 
 
 On Wed, Oct 17, 2012 at 1:04 PM, Sylvain Lebresne sylv...@datastax.com
 wrote:
 
 On Wed, Oct 17, 2012 at 3:17 AM, Vivek Mishra mishra.v...@gmail.com
 wrote:
 column name will be 2012-07-24:2:alliance_involvement or
 alliance_involvement?
 
 The former. Though let's clarify that
 2012-07-24:2:alliance_involvement is the string representation of a
 composite name (i.e. one compatible with CompositeType) for display by
 the cli. What you will get is a composite containing 3 components, the
 first will be the string '2012-07-24', the second one will be the int
 2 and the last one will be the string 'alliance_involvement'.
 
 --
 Sylvain
 
 
 -Vivek
 
 On Tue, Oct 16, 2012 at 10:25 PM, Sylvain Lebresne
 sylv...@datastax.com
 wrote:
 
 On Tue, Oct 16, 2012 at 12:31 PM, Vivek Mishra mishra.v...@gmail.com
 wrote:
 Thanks Sylvain. I missed it. If i try to access these via thrift API,
 what
 will be the column names?
 
 I'm not sure I understand the question. The cli output is pretty much
 what you get via the thrift API.
 
 --
 Sylvain

Re: Astyanax empty column check

Very slim reason to link to my favourite Joe Celko
(http://en.wikipedia.org/wiki/Joe_Celko) quote:

'LOL! My wife is an ordained Soto Zen priest. I would say after 30 years
together, I'd go with her. She is the only person who understood NULLs
immediately.
http://www.simple-talk.com/opinion/geek-of-the-week/geek-of-the-week-joe-celko/

a. A row which has only key without columns
b. No this row in database.
From the point of view of the API a row with zero *live* columns is the same as
a row that does not exist.

A row may exist on disk, but be made up of non-live columns. These are a
combination of expired TTL columns and columns overwritten by (row or column)
tombstones. Eventually a row with 0 live columns, with 0 non live columns will
be compacted and purged from disk.

Hope that helps.

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 18/10/2012, at 1:34 AM, Hiller, Dean dean.hil...@nrel.gov wrote:

What specifically are you trying to achieve? The business requirement might
help as there are other ways of solving it such that you do not need to know
the difference.

Dean

From: Xu Renjie xrjxrjxrj...@gmail.commailto:xrjxrjxrj...@gmail.com
Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org
user@cassandra.apache.orgmailto:user@cassandra.apache.org
Date: Wednesday, October 17, 2012 4:48 AM
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org
user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: Re: Astyanax empty column check

So what you mean is essentially there is *no* way to differentiate it because
what they appear is the same?

On Wed, Oct 17, 2012 at 5:58 PM, rohit bhatia
rohit2...@gmail.commailto:rohit2...@gmail.com wrote:
See
If you attempt to retrieve an entire row and it returns a result with
no columns, it effectively means that row does not exist.
Essentially a row without co

http://stackoverflow.com/questions/8072253/is-there-a-difference-between-an-empty-key-and-a-key-that-doesnt-exist

lumns doesn't exist.. (except those with tombstones)
from here
On Wed, Oct 17, 2012 at 2:17 PM, Xu Renjie
xrjxrjxrj...@gmail.commailto:xrjxrjxrj...@gmail.com wrote:
Sorry for the version, I am using 1.0.1 Astyanax.

On Wed, Oct 17, 2012 at 4:44 PM, Xu Renjie
xrjxrjxrj...@gmail.commailto:xrjxrjxrj...@gmail.com wrote:

hello guys,
I am currently using Astyanax as a client(new to Astyanax). But I am
not clear how to differentiate the following 2 situations:
a. A row which has only key without columns
b. No this row in database.

Since when I use RowQuery to query Cassandra with given key, both the
above two situations will return a ColumnList
with size 0. And also I didn't find other api can handle this.
Do you have any better way for this? Thanks in advance.
Cheers,
Xu

Re: RF update

 Follow up question: Is it safe to abort the compactions happening after node 
 repair?
It is always safe to abort a compaction. The purpose of compaction is to 
replicate the current truth in a more compact format. It does not modify data, 
it just creates new files. The worse case would be killing it between the time 
the new files are marked as non temp and the time the old files are deleted. 
That would result in wasted disk space, but the truth in the system would not 
change. 

 
  Question: These additional compactions seem redundant since there are no 
  reads or writes on the cluster after the first major compaction 
  (immediately after the data load), is that right?

Repair transfers a portion of the  -Data.db component from potentially multiple 
SSTables. This may result in multiple new SStables being created on the 
receiving node. Once the files are created they are processed in a similar way 
to when a memtable is flushed and so compaction kicks in.

 And if so, what can we do to avoid them? We are currently waiting multiple 
 days.

That fact that compaction is taking so long is odd. Have you checked the logs 
for GC problems? if you are running an SSD backed instance and have turned off 
compaction throttling the high IO throughput can result in mucho garbage. 
Faster is not always better. 

To improve your situation consider:

* disabling compaction by setting min_compaction_threshold and 
max_compaction_threshold to 0 via schema or nodetool
* disabling durable_writes to disable the commit log during the bulk load. 

Cheers


-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 17/10/2012, at 11:55 PM, Matthias Broecheler m...@matthiasb.com wrote:

 Follow up question: Is it safe to abort the compactions happening after node 
 repair?
 
 On Mon, Oct 15, 2012 at 6:32 PM, Will Martin w...@voodoolunchbox.com wrote:
 +1   It doesn't make sense that the xfr compactions are heavy unless they are 
 translating the file. This could be a protocol mismatch: however the 
 requirements for node level compaction and wire compaction I would expect to 
 be pretty different.
 On Oct 15, 2012, at 4:42 PM, Matthias Broecheler wrote:
 
  Hey,
 
  we are writing a lot of data into a cassandra cluster for a batch loading 
  use case. We cannot use the sstable batch loader, so in order to speed up 
  the loading process we are using RF=1 while the data is loading. After the 
  load is complete, we want to increase the RF. For that, we are updating the 
  RF in the schema and then run the node repair tool on each cassandra 
  instance to stream the data over. However, we are noticing that this 
  process is slowed down by a lot of compactions (the actually streaming of 
  data only takes a couple of minutes).
 
  Cassandra is already running a major compaction after the data loading 
  process has completed. But then, there are to be two more compactions (one 
  on the sender and one on the receiver) happening and those take a very long 
  time even on the aws high i/o instance with no compaction throttling.
 
  Question: These additional compactions seem redundant since there are no 
  reads or writes on the cluster after the first major compaction 
  (immediately after the data load), is that right? And if so, what can we do 
  to avoid them? We are currently waiting multiple days.
 
  Thank you very much for your help,
  Matthias
 
 
 
 
 
 -- 
 Matthias Broecheler, PhD
 http://www.matthiasb.com
 E-Mail: m...@matthiasb.com

how to get column type?

2012-10-18 Thread Hagos, A.S.

Hi all,
I am wondering if there is a way to know the column type of an already stored 
value in  Cassandra.
My specific case is to get a column value of a known column name but not type.

greetings 
Ambes

Re: potential data loss in Cassandra 1.1.0 .. 1.1.4

2012-10-18 Thread Alain RODRIGUEZ

Hi Jonathan.

We are currently running the datastax AMI on amazon. Cassandra is in
version 1.1.2.

I guess that the datastax repo (deb
http://debian.datastax.com/communitystable main) will be updated
directly in 1.1.6 ?

Replaying already-flushed data a second time is harmless -- except
for counters.
 So, to avoid replaying flushed counter data, we recommend performing drain
when shutting down the pre-1.1.6 C* prior to upgrade.

I'm afraid to forget draining my node before my next update or update +
expand.

Could you ask your team to add this specific warning in your documentation
like here : http://www.datastax.com/docs/1.1/install/expand_ami (we use to
update to last stable release before expand) or here :
http://www.datastax.com/docs/1.1/install/upgrading or in any other place
where this could be useful ?

Having counters replayed would lead to a big mess in our app, I guess there
are more people in our case who could save a lot of time and money with an
up to date documentation.

Anyway, thank you for this bug fix and this warning.

Alain

2012/10/17 Jonathan Ellis jbel...@gmail.com

 I wanted to call out a particularly important bug for those who aren't
 in the habit of reading CHANGES.

 Summary: the bug was fixed in 1.1.5, with an follow-on fix for 1.1.6
 that only affects users of 1.1.0 .. 1.1.4.  Thus, if you upgraded from
 1.0.x or earlier directly to 1.1.5, you're okay as far as this is
 concerned.  But if you used an earlier 1.1 release, you should upgrade
 to 1.1.6.

 Explanation:

 A rewrite of the commitlog code for 1.1.0 used Java's nanotime api to
 generate commitlog segment IDs.  This could cause data loss in the
 event of a power failure, since we assume commitlog IDs are strictly
 increasing in our replay logic.  Simplified, the replay logic looks like
 this:

 1. Take the most recent flush time X for each columnfamily
 2. Replay all activity in the commitlog that occurred after X

 The problem is that nanotime gets effectively a new random seed after
 a reboot.  If the new seed is substantially below the old one, any new
 commitlog segments will never be after the pre-reboot flush
 timestamps.  Subsequently, restarting Cassandra will not replay any
 unflushed updates.

 We fixed the nanotime problem in 1.1.5 (CASSANDRA-4601).  But, we
 didn't realize the implications for replay timestamps until later
 (CASSANDRA-4782).  To fix these retroactively, 1.1.6 sets the flush
 time of pre-1.1.6 sstables to zero.  Thus, the first startup of 1.1.6
 will result in replaying the entire commitlog, including data that may
 have already been flushed.

 Replaying already-flushed data a second time is harmless -- except for
 counters.  So, to avoid replaying flushed counter data, we recommend
 performing drain when shutting down the pre-1.1.6 C* prior to upgrade.

 --
 Jonathan Ellis
 Project Chair, Apache Cassandra
 co-founder of DataStax, the source for professional Cassandra support
 http://www.datastax.com

Re: how to get column type?

2012-10-18 Thread Hiller, Dean

This is specifically why Cassandra and even PlayOrm are going the
direction of partial schemas.  Everything in cassandra in raw form is
just bytes.  If you don't tell it the types, it doesn't know how to
translate it.  PlayOrm and other ORM layers are the same way though in
these noSQL ORMs you typically have a schema where it is sort of like this

If(colName.equals(name))
   return String.class;
else if(colName.equals(age))
   Return Integer.class;

So column values are typed such that a command line tool like PlayOrm's
command line tool can query and know how to translate the results.  Any
parts of the schema that are not known are just returned in hex.

So schemaless is cool, but sometimes it is a big pain as well.

Dean

On 10/18/12 6:24 AM, Hagos, A.S. a.s.ha...@tue.nl wrote:

Hi all,
I am wondering if there is a way to know the column type of an already
stored value in  Cassandra.
My specific case is to get a column value of a known column name but not
type.

greetings 
Ambes

replaced node keeps returning in gossip

2012-10-18 Thread Thomas van Neerijnen

Hi all

I'm running Cassandra 1.0.11 on Ubuntu 11.10.

I've got a ghost node which keeps showing up on my ring.

A node living on IP 10.16.128.210 and token 0 died and had to be replaced.
I replaced it with a new node, IP 10.16.128.197 and again token 0 with a
-Dcassandra.replace_token=0 at startup. This all went well but now I'm
seeing the following weirdness constantly reported in the log files around
the ring:

 INFO [GossipTasks:1] 2012-10-18 13:39:22,441 Gossiper.java (line 632)
FatClient /10.16.128.210 has been silent for 3ms, removing from gossip
 INFO [GossipStage:1] 2012-10-18 13:40:25,933 Gossiper.java (line 838) Node
/10.16.128.210 is now part of the cluster
 INFO [GossipStage:1] 2012-10-18 13:40:25,934 Gossiper.java (line 804)
InetAddress /10.16.128.210 is now UP
 INFO [GossipStage:1] 2012-10-18 13:40:25,937 StorageService.java (line
1017) Nodes /10.16.128.210 and /10.16.128.197 have the same token 0.
Ignoring /10.16.128.210
 INFO [GossipTasks:1] 2012-10-18 13:40:37,509 Gossiper.java (line 818)
InetAddress /10.16.128.210 is now dead.
 INFO [GossipTasks:1] 2012-10-18 13:40:56,526 Gossiper.java (line 632)
FatClient /10.16.128.210 has been silent for 3ms, removing from gossip

Re: UnreachableNodes

2012-10-18 Thread Rene Kochen

Thanks Aaron,

Telnet works (in both directions).

After a normal (i.e. without discarding ring state) restart of the node
reporting the other one as down, the ring shows up again. So a node
restarts fixes the incorrect state.

I see this error occasionally.

I will further investigate and post more details when it happens again.

2012/10/18 aaron morton aa...@thelastpickle.com

 You can double check the node reporting 9.109 as down can telnet to port
 7000 on 9.109.

 Then I would restart 9.109 with -Dcassandra.load_ring_state=false added as
 a JVM param in cassandra-env.sh.

 If is still shows as down can you post the output from nodetool gossipinfo
 from 9.109 and the node that sees 9.109 as down.

 Cheers


 -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com

 On 18/10/2012, at 8:45 PM, Rene Kochen rene.koc...@schange.com wrote:

 I have a four node EC2 cluster.

 Three machines show via nodetool ring that all machines are UP.
 One machine shows via nodetool ring that one machine is DOWN.

 If I take a closer to the machine reporting the other machine as down, I
 see the following:

 - StorageService.UnreachableNodes = 10.49.9.109
 - FailureDetector.SimpleStates: 10.49.9.109 = UP

 So gossip is fine. Actually the whole 10.49.9.109 machine is fine. I see
 in the logging that there is communication between 10.49.9.109 and the
 machine reporting it as down.

 How or when is a node removed from the UnreachableNodes list and reported
 as UP again via nodetool ring?

 I use Cassandra 1.0.11

 Thanks!

 Rene

Re: Why my Cassandra is compacting like mad

2012-10-18 Thread Bryan

I think I am seeing the same issue, but it doesn't seem to be related to the 
schema_columns. I understand that repair is supposed to be intensive, but this 
is bringing the associated  machine to its knees, to the point that logging on 
the machine takes a very, very long time and requests are no longer served 
(load avg ~2000.0). Is this normal? Is this a symptom of the machine not 
compacting enough during normal operation (minor compactions)? Thoughts?

Cassandra 1.1.5, 12 node application cluster connected to a smaller analytics 
cluster
The analytics cluster was repairing, but it seemed to swamp one of the nodes on 
12-node cluster.
Java 1.6_u5, CentOS
40 - 80 GB on each node (too much?)
Main CF has 4 indexes
standard configs, no multithreaded compaction, 16 mb compaction throughput

Speaking of the throughput, reading the cassandra.yaml file made me think that 
the throughput is not set correctly, but I'm not sure how to calculate the 
ideal value. Should I only consider the actual data size inserted, or should I 
use a single-node load figure / uptime_seconds as a guess (assuming constant 
load)?

Thanks,

Bryan

Re: http://www.mail-archive.com/user@cassandra.apache.org/msg25561.html

constant CMS GC using CPU time

2012-10-18 Thread Bryan Talbot

In a 4 node cluster running Cassandra 1.1.5 with sun jvm 1.6.0_29-b11
(64-bit), the nodes are often getting stuck in state where CMS
collections of the old space are constantly running.

The JVM configuration is using the standard settings in cassandra-env --
relevant settings are included below.  The max heap is currently set to 5
GB with 800MB for new size.  I don't believe that the cluster is overly
busy and seems to be performing well enough other than this issue.  When
nodes get into this state they never seem to leave it (by freeing up old
space memory) without restarting cassandra.  They typically enter this
state while running nodetool repair -pr but once they start doing this,
restarting them only fixes it for a couple of hours.

Compactions are completing and are generally not queued up.  All CF are
using STCS.  The busiest CF consumes about 100GB of space on disk, is write
heavy, and all columns have a TTL of 3 days.  Overall, there are 41 CF
including those used for system keyspace and secondary indexes.  The number
of SSTables per node currently varies from 185-212.

Other than frequent log warnings about GCInspector  - Heap is 0.xxx full...
and StorageService  - Flushing CFS(...) to relieve memory pressure there
are no other log entries to indicate there is a problem.

Does the memory needed vary depending on the amount of data stored?  If so,
how can I predict how much jvm space is needed?  I don't want to make the
heap too large as that's bad too.  Maybe there's a memory leak related to
compaction that doesn't allow meta-data to be purged?


-Bryan


12 GB of RAM in host with ~6 GB used by java and ~6 GB for OS and buffer
cache.
$ free -m
 total   used   free sharedbuffers cached
Mem: 12001  11870131  0  4   5778
-/+ buffers/cache:   6087   5914
Swap:0  0  0


jvm settings in cassandra-env
MAX_HEAP_SIZE=5G
HEAP_NEWSIZE=800M

# GC tuning options
JVM_OPTS=$JVM_OPTS -XX:+UseParNewGC
JVM_OPTS=$JVM_OPTS -XX:+UseConcMarkSweepGC
JVM_OPTS=$JVM_OPTS -XX:+CMSParallelRemarkEnabled
JVM_OPTS=$JVM_OPTS -XX:SurvivorRatio=8
JVM_OPTS=$JVM_OPTS -XX:MaxTenuringThreshold=1
JVM_OPTS=$JVM_OPTS -XX:CMSInitiatingOccupancyFraction=75
JVM_OPTS=$JVM_OPTS -XX:+UseCMSInitiatingOccupancyOnly
JVM_OPTS=$JVM_OPTS -XX:+UseCompressedOops


jstat shows about 12 full collections per minute with old heap usage
constantly over 75% so CMS is always over the
CMSInitiatingOccupancyFraction threshold.

$ jstat -gcutil -t 22917 5000 4
Timestamp S0 S1 E  O  P YGC YGCTFGC
 FGCT GCT
   132063.0  34.70   0.00  26.03  82.29  59.88  21580  506.887 17523
3078.941 3585.829
   132068.0  34.70   0.00  50.02  81.23  59.88  21580  506.887 17524
3079.220 3586.107
   132073.1   0.00  24.92  46.87  81.41  59.88  21581  506.932 17525
3079.583 3586.515
   132078.1   0.00  24.92  64.71  81.40  59.88  21581  506.932 17527
3079.853 3586.785


Other hosts not currently experiencing the high CPU load have a heap less
than .75 full.

$ jstat -gcutil -t 6063 5000 4
Timestamp S0 S1 E  O  P YGC YGCTFGC
 FGCT GCT
   520731.6   0.00  12.70  36.37  71.33  59.26  46453 1688.809 14785
2130.779 3819.588
   520736.5   0.00  12.70  53.25  71.33  59.26  46453 1688.809 14785
2130.779 3819.588
   520741.5   0.00  12.70  68.92  71.33  59.26  46453 1688.809 14785
2130.779 3819.588
   520746.5   0.00  12.70  83.11  71.33  59.26  46453 1688.809 14785
2130.779 3819.588

Hinted Handoff runs every ten minutes

2012-10-18 Thread Stephen Pierce

I installed Cassandra on three nodes. I then ran a test suite against them to 
generate load. The test suite is designed to generate the same type of load 
that we plan to have in production. As one of many tests, I reset one of the 
nodes to check the failure/recovery modes.  Cassandra worked just fine.

I stopped the load generation, and got distracted with some other 
project/problem. A few days later, I noticed something strange on one of the 
nodes. On this node hinted handoff starts every ten minutes, and while it seems 
to finish without any errors, it will be started again in ten minutes. None of 
the nodes has any traffic, and hasn't for several days. I checked the logs, and 
this goes back to the initial failure/recovery testing:

INFO [HintedHandoff:1] 2012-10-18 10:19:26,618 HintedHandOffManager.java (line 
294) Started hinted handoff for token: 113427455640312821154458202477256070484 
with IP: /192.168.128.136
INFO [HintedHandoff:1] 2012-10-18 10:19:26,779 HintedHandOffManager.java (line 
390) Finished hinted handoff of 0 rows to endpoint /192.168.128.136
INFO [HintedHandoff:1] 2012-10-18 10:29:26,622 HintedHandOffManager.java (line 
294) Started hinted handoff for token: 113427455640312821154458202477256070484 
with IP: /192.168.128.136
INFO [HintedHandoff:1] 2012-10-18 10:29:26,735 HintedHandOffManager.java (line 
390) Finished hinted handoff of 0 rows to endpoint /192.168.128.136
INFO [HintedHandoff:1] 2012-10-18 10:39:26,624 HintedHandOffManager.java (line 
294) Started hinted handoff for token: 113427455640312821154458202477256070484 
with IP: /192.168.128.136
INFO [HintedHandoff:1] 2012-10-18 10:39:26,751 HintedHandOffManager.java (line 
390) Finished hinted handoff of 0 rows to endpoint /192.168.128.136

The other nodes are happy and don't show this behavior. All the test data is 
readable, and everything is fine, but I'm curious why hinted handoff is running 
on one node all the time.

I searched the bug database, and I found a bug that seems to have the same 
symptoms:
https://issues.apache.org/jira/browse/CASSANDRA-3733
Although it's been marked fixed in 0.6, this describes my problem exactly.

I'm running Cassandra 1.1.5 from Datastax on Centos 6.0:
http://rpm.datastax.com/community/noarch/apache-cassandra11-1.1.5-1.noarch.rpm

Is anyone else seeing this behavior? What can I do to provide more information?

Steve

Re: Hinted Handoff runs every ten minutes

2012-10-18 Thread David Daeschler

Hi Steve,

Also confirming this. After having a node go down on Cassandra 1.0.8
there seems to be hinted handoff between two of our 4 nodes every 10
minutes. Our setup also shows 0 rows. It does not appear to have any
effect on the operation of the ring, just fills up the log files.

- David



On Thu, Oct 18, 2012 at 2:10 PM, Stephen Pierce spie...@verifyle.com wrote:
 I installed Cassandra on three nodes. I then ran a test suite against them
 to generate load. The test suite is designed to generate the same type of
 load that we plan to have in production. As one of many tests, I reset one
 of the nodes to check the failure/recovery modes.  Cassandra worked just
 fine.



 I stopped the load generation, and got distracted with some other
 project/problem. A few days later, I noticed something strange on one of the
 nodes. On this node hinted handoff starts every ten minutes, and while it
 seems to finish without any errors, it will be started again in ten minutes.
 None of the nodes has any traffic, and hasn’t for several days. I checked
 the logs, and this goes back to the initial failure/recovery testing:



 INFO [HintedHandoff:1] 2012-10-18 10:19:26,618 HintedHandOffManager.java
 (line 294) Started hinted handoff for token:
 113427455640312821154458202477256070484 with IP: /192.168.128.136

 INFO [HintedHandoff:1] 2012-10-18 10:19:26,779 HintedHandOffManager.java
 (line 390) Finished hinted handoff of 0 rows to endpoint /192.168.128.136

 INFO [HintedHandoff:1] 2012-10-18 10:29:26,622 HintedHandOffManager.java
 (line 294) Started hinted handoff for token:
 113427455640312821154458202477256070484 with IP: /192.168.128.136

 INFO [HintedHandoff:1] 2012-10-18 10:29:26,735 HintedHandOffManager.java
 (line 390) Finished hinted handoff of 0 rows to endpoint /192.168.128.136

 INFO [HintedHandoff:1] 2012-10-18 10:39:26,624 HintedHandOffManager.java
 (line 294) Started hinted handoff for token:
 113427455640312821154458202477256070484 with IP: /192.168.128.136

 INFO [HintedHandoff:1] 2012-10-18 10:39:26,751 HintedHandOffManager.java
 (line 390) Finished hinted handoff of 0 rows to endpoint /192.168.128.136



 The other nodes are happy and don’t show this behavior. All the test data is
 readable, and everything is fine, but I’m curious why hinted handoff is
 running on one node all the time.



 I searched the bug database, and I found a bug that seems to have the same
 symptoms:

 https://issues.apache.org/jira/browse/CASSANDRA-3733

 Although it’s been marked fixed in 0.6, this describes my problem exactly.



 I’m running Cassandra 1.1.5 from Datastax on Centos 6.0:

 http://rpm.datastax.com/community/noarch/apache-cassandra11-1.1.5-1.noarch.rpm



 Is anyone else seeing this behavior? What can I do to provide more
 information?



 Steve





-- 
David Daeschler

hadoop consistency level

Hello, everybody!
I'm thinking about running hadoop jobs on the top of the cassandra
cluster. My understanding is - hadoop jobs read data from local nodes
only. Does it mean the consistency level is always ONE?

Thank you,
  Andrey

Re: hadoop consistency level

2012-10-18 Thread Jean-Nicolas Boulay Desjardins

Why don't you look into Brisk:
http://www.datastax.com/docs/0.8/brisk/about_brisk

On Thu, Oct 18, 2012 at 2:46 PM, Andrey Ilinykh ailin...@gmail.com wrote:

 Hello, everybody!
 I'm thinking about running hadoop jobs on the top of the cassandra
 cluster. My understanding is - hadoop jobs read data from local nodes
 only. Does it mean the consistency level is always ONE?

 Thank you,
   Andrey

Re: hadoop consistency level

2012-10-18 Thread William Oberman

A recent thread made it sound like Brisk was no longer a datastax supported
thing (it's DataStax Enterpise, or DSE, now):
http://www.mail-archive.com/user@cassandra.apache.org/msg24921.html

In particular this response:
http://www.mail-archive.com/user@cassandra.apache.org/msg25061.html

On Thu, Oct 18, 2012 at 2:49 PM, Jean-Nicolas Boulay Desjardins 
jnbdzjn...@gmail.com wrote:

 Why don't you look into Brisk:
 http://www.datastax.com/docs/0.8/brisk/about_brisk


 On Thu, Oct 18, 2012 at 2:46 PM, Andrey Ilinykh ailin...@gmail.comwrote:

 Hello, everybody!
 I'm thinking about running hadoop jobs on the top of the cassandra
 cluster. My understanding is - hadoop jobs read data from local nodes
 only. Does it mean the consistency level is always ONE?

 Thank you,
   Andrey

Re: hadoop consistency level

Unless you have Brisk (however as far as I know there was one fork that got it 
working on 1.0 but nothing for 1.1 and is not being actively maintained by 
Datastax) or go with CFS (which comes with DSE) you are not guaranteed all data 
is on that hadoop node. You can take a look at the forks if interested here: 
https://github.com/riptano/brisk/network but I'd personally be afraid to put my 
eggs in a basket that is certainly not super supported anymore.


job.getConfiguration().set(cassandra.consistencylevel.read, QUORUM); should 
get you started.


Best,

michael


From: Jean-Nicolas Boulay Desjardins 
jnbdzjn...@gmail.commailto:jnbdzjn...@gmail.com
Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
user@cassandra.apache.orgmailto:user@cassandra.apache.org
Date: Thursday, October 18, 2012 11:49 AM
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: Re: hadoop consistency level

Why don't you look into Brisk:  
http://www.datastax.com/docs/0.8/brisk/about_brisk

On Thu, Oct 18, 2012 at 2:46 PM, Andrey Ilinykh 
ailin...@gmail.commailto:ailin...@gmail.com wrote:
Hello, everybody!
I'm thinking about running hadoop jobs on the top of the cassandra
cluster. My understanding is - hadoop jobs read data from local nodes
only. Does it mean the consistency level is always ONE?

Thank you,
  Andrey


'Like' us on Facebook for exclusive content and other resources on all 
Barracuda Networks solutions.
Visit http://barracudanetworks.com/facebook

Re: hadoop consistency level

2012-10-18 Thread Jean-Nicolas Boulay Desjardins

I am surprise that it was abandoned this way. So if I want to use
Brisk on Cassandra 1.1 I have to use DataStax Entreprise service...

On Thu, Oct 18, 2012 at 3:00 PM, Michael Kjellman
mkjell...@barracuda.com wrote:

 Unless you have Brisk (however as far as I know there was one fork that
 got it working on 1.0 but nothing for 1.1 and is not being actively
 maintained by Datastax) or go with CFS (which comes with DSE) you are not
 guaranteed all data is on that hadoop node. You can take a look at the forks
 if interested here: https://github.com/riptano/brisk/network but I'd
 personally be afraid to put my eggs in a basket that is certainly not super
 supported anymore.

 job.getConfiguration().set(cassandra.consistencylevel.read, QUORUM);
 should get you started.


 Best,

 michael



 From: Jean-Nicolas Boulay Desjardins jnbdzjn...@gmail.com
 Reply-To: user@cassandra.apache.org user@cassandra.apache.org
 Date: Thursday, October 18, 2012 11:49 AM
 To: user@cassandra.apache.org user@cassandra.apache.org
 Subject: Re: hadoop consistency level

 Why don't you look into Brisk:
 http://www.datastax.com/docs/0.8/brisk/about_brisk

 On Thu, Oct 18, 2012 at 2:46 PM, Andrey Ilinykh ailin...@gmail.com
 wrote:

 Hello, everybody!
 I'm thinking about running hadoop jobs on the top of the cassandra
 cluster. My understanding is - hadoop jobs read data from local nodes
 only. Does it mean the consistency level is always ONE?

 Thank you,
   Andrey



 --
 'Like' us on Facebook for exclusive content and other resources on all
 Barracuda Networks solutions.
 Visit http://barracudanetworks.com/facebook

Re: hadoop consistency level

Honestly, I think what they did re Brisk development is fair. They left
the code for any of us in the community to improve it and make it
compatible with newer versions and they need to make money as a company as
well. They already contribute so much to the Cassandra community in
general and they are certainly not trying to stop people from continuing
to develop Brisk. Hadoop jobs that input and output to Cassandra will also
work without it as well. If you need the features of CFS and don’t want to
maintain HDFS then yes you'll have to pay for DSE.

If you are having issues with data not on the particular node that you are
reading from with Hadoop I'd go ahead and set the consistency level in
your job configuration as I recommended previously. Note there is also a
cassandra.consistencylevel.write setting if you are using either the
ColumnFamilyOutputFormat or BulkOutputFormat classes.

In terms of performance I have a MR job that reads in 30 million rows with
QUORUM consistency on 1.1.6 with RandomPartitioner and the mapper takes
about 11 minutes across 3 Hadoop nodes (our Cassandra cluster is obviously
larger but we haven't fully scaled out our Hadoop cluster yet). Hardware
is 2 7200 rpm drives + SSD for the commit log, 32GB of RAM, and 12 cores
per node. Hope this helps.

Best,
michael

On 10/18/12 12:24 PM, Jean-Nicolas Boulay Desjardins
jnbdzjn...@gmail.com wrote:

I am surprise that it was abandoned this way. So if I want to use
Brisk on Cassandra 1.1 I have to use DataStax Entreprise service...

On Thu, Oct 18, 2012 at 3:00 PM, Michael Kjellman
mkjell...@barracuda.com wrote:

 Unless you have Brisk (however as far as I know there was one fork that
 got it working on 1.0 but nothing for 1.1 and is not being actively
 maintained by Datastax) or go with CFS (which comes with DSE) you are
not
 guaranteed all data is on that hadoop node. You can take a look at the
forks
 if interested here: https://github.com/riptano/brisk/network but I'd
 personally be afraid to put my eggs in a basket that is certainly not
super
 supported anymore.

 job.getConfiguration().set(cassandra.consistencylevel.read, QUORUM);
 should get you started.


 Best,

 michael



 From: Jean-Nicolas Boulay Desjardins jnbdzjn...@gmail.com
 Reply-To: user@cassandra.apache.org user@cassandra.apache.org
 Date: Thursday, October 18, 2012 11:49 AM
 To: user@cassandra.apache.org user@cassandra.apache.org
 Subject: Re: hadoop consistency level

 Why don't you look into Brisk:
 http://www.datastax.com/docs/0.8/brisk/about_brisk

 On Thu, Oct 18, 2012 at 2:46 PM, Andrey Ilinykh ailin...@gmail.com
 wrote:

 Hello, everybody!
 I'm thinking about running hadoop jobs on the top of the cassandra
 cluster. My understanding is - hadoop jobs read data from local nodes
 only. Does it mean the consistency level is always ONE?

 Thank you,
   Andrey



 --
 'Like' us on Facebook for exclusive content and other resources on all
 Barracuda Networks solutions.
 Visit http://barracudanetworks.com/facebook
   


'Like' us on Facebook for exclusive content and other resources on all 
Barracuda Networks solutions.
Visit http://barracudanetworks.com/facebook

Re: hadoop consistency level

On Thu, Oct 18, 2012 at 12:00 PM, Michael Kjellman
mkjell...@barracuda.com wrote:
 Unless you have Brisk (however as far as I know there was one fork that got
 it working on 1.0 but nothing for 1.1 and is not being actively maintained
 by Datastax) or go with CFS (which comes with DSE) you are not guaranteed
 all data is on that hadoop node. You can take a look at the forks if
 interested here: https://github.com/riptano/brisk/network but I'd personally
 be afraid to put my eggs in a basket that is certainly not super supported
 anymore.

 job.getConfiguration().set(cassandra.consistencylevel.read, QUORUM);
 should get you started.
This is what I don't understand. With QUORUM you read data from at
least two nodes. If so, you don't benefit from data locality. What's
the point to use hadoop? I can run application on any machine(s) and
iterate through column family. What is the difference?

Thank you,
  Andrey

Re: hadoop consistency level

Well there is *some* data locality, it's just not guaranteed. My
understanding (and someone correct me if I'm wrong) is that
ColumnFamilyInputFormat implements InputSplit and the getLocations()
method.

http://hadoop.apache.org/docs/mapreduce/current/api/org/apache/hadoop/mapre
duce/InputSplit.html

ColumnFamilySplit.java contains logic to do it's best to determine what
node that particular hadoop node contains the data for that mapper.

But obviously this isn't guaranteed though that all data will be on that
node.

Also, for the sake of completeness, we have RF=3 on the Keyspace in
question.

On 10/18/12 1:15 PM, Andrey Ilinykh ailin...@gmail.com wrote:

On Thu, Oct 18, 2012 at 12:00 PM, Michael Kjellman
mkjell...@barracuda.com wrote:
 Unless you have Brisk (however as far as I know there was one fork that
got
 it working on 1.0 but nothing for 1.1 and is not being actively
maintained
 by Datastax) or go with CFS (which comes with DSE) you are not
guaranteed
 all data is on that hadoop node. You can take a look at the forks if
 interested here: https://github.com/riptano/brisk/network but I'd
personally
 be afraid to put my eggs in a basket that is certainly not super
supported
 anymore.

 job.getConfiguration().set(cassandra.consistencylevel.read, QUORUM);
 should get you started.
This is what I don't understand. With QUORUM you read data from at
least two nodes. If so, you don't benefit from data locality. What's
the point to use hadoop? I can run application on any machine(s) and
iterate through column family. What is the difference?

Thank you,
  Andrey


'Like' us on Facebook for exclusive content and other resources on all 
Barracuda Networks solutions.
Visit http://barracudanetworks.com/facebook

Re: hadoop consistency level

On Thu, Oct 18, 2012 at 1:24 PM, Michael Kjellman
mkjell...@barracuda.com wrote:
 Well there is *some* data locality, it's just not guaranteed. My
 understanding (and someone correct me if I'm wrong) is that
 ColumnFamilyInputFormat implements InputSplit and the getLocations()
 method.

 http://hadoop.apache.org/docs/mapreduce/current/api/org/apache/hadoop/mapre
 duce/InputSplit.html

 ColumnFamilySplit.java contains logic to do it's best to determine what
 node that particular hadoop node contains the data for that mapper.

But no guarantee local data is in sync with other nodes. Which means
you have CL ONE. If you want CL QUORUM you have to make remote call,
no matter if data is local or not.

Re: hadoop consistency level

Not sure I understand your question (if there is one..)

You are more than welcome to do CL ONE and assuming you have hadoop nodes
in the right places on your ring things could work out very nicely. If you
need to guarantee that you have all the data in your job then you'll need
to use QUORUM.

If you don't specify a CL in your job config it will default to ONE (at
least that's what my read of the ConfigHelper source for 1.1.6 shows)

On 10/18/12 1:29 PM, Andrey Ilinykh ailin...@gmail.com wrote:

On Thu, Oct 18, 2012 at 1:24 PM, Michael Kjellman
mkjell...@barracuda.com wrote:
 Well there is *some* data locality, it's just not guaranteed. My
 understanding (and someone correct me if I'm wrong) is that
 ColumnFamilyInputFormat implements InputSplit and the getLocations()
 method.

 
http://hadoop.apache.org/docs/mapreduce/current/api/org/apache/hadoop/map
re
 duce/InputSplit.html

 ColumnFamilySplit.java contains logic to do it's best to determine what
 node that particular hadoop node contains the data for that mapper.

But no guarantee local data is in sync with other nodes. Which means
you have CL ONE. If you want CL QUORUM you have to make remote call,
no matter if data is local or not.


'Like' us on Facebook for exclusive content and other resources on all 
Barracuda Networks solutions.
Visit http://barracudanetworks.com/facebook

Re: hadoop consistency level

On Thu, Oct 18, 2012 at 1:34 PM, Michael Kjellman
mkjell...@barracuda.com wrote:
 Not sure I understand your question (if there is one..)

 You are more than welcome to do CL ONE and assuming you have hadoop nodes
 in the right places on your ring things could work out very nicely. If you
 need to guarantee that you have all the data in your job then you'll need
 to use QUORUM.

 If you don't specify a CL in your job config it will default to ONE (at
 least that's what my read of the ConfigHelper source for 1.1.6 shows)

I have two questions.
1. I can benefit from data locality (and Hadoop) only with CL ONE. Is
it correct?
2. With CL QUORUM cassandra reads data from all replicas. In this case
Hadoop doesn't give me any  benefits. Application running outside the
cluster has the same performance. Is it correct?

Thank you,
  Andrey

Re: hadoop consistency level

2012-10-18 Thread Bryan Talbot

I believe that reading with CL.ONE will still cause read repair to be run
(in the background) 'read_repair_chance' of the time.

-Bryan


On Thu, Oct 18, 2012 at 1:52 PM, Andrey Ilinykh ailin...@gmail.com wrote:

 On Thu, Oct 18, 2012 at 1:34 PM, Michael Kjellman
 mkjell...@barracuda.com wrote:
  Not sure I understand your question (if there is one..)
 
  You are more than welcome to do CL ONE and assuming you have hadoop nodes
  in the right places on your ring things could work out very nicely. If
 you
  need to guarantee that you have all the data in your job then you'll need
  to use QUORUM.
 
  If you don't specify a CL in your job config it will default to ONE (at
  least that's what my read of the ConfigHelper source for 1.1.6 shows)
 
 I have two questions.
 1. I can benefit from data locality (and Hadoop) only with CL ONE. Is
 it correct?
 2. With CL QUORUM cassandra reads data from all replicas. In this case
 Hadoop doesn't give me any  benefits. Application running outside the
 cluster has the same performance. Is it correct?

 Thank you,
   Andrey

Re: hadoop consistency level

2012-10-18 Thread Jeremy Hanna


On Oct 18, 2012, at 3:52 PM, Andrey Ilinykh ailin...@gmail.com wrote:

 On Thu, Oct 18, 2012 at 1:34 PM, Michael Kjellman
 mkjell...@barracuda.com wrote:
 Not sure I understand your question (if there is one..)
 
 You are more than welcome to do CL ONE and assuming you have hadoop nodes
 in the right places on your ring things could work out very nicely. If you
 need to guarantee that you have all the data in your job then you'll need
 to use QUORUM.
 
 If you don't specify a CL in your job config it will default to ONE (at
 least that's what my read of the ConfigHelper source for 1.1.6 shows)
 
 I have two questions.
 1. I can benefit from data locality (and Hadoop) only with CL ONE. Is
 it correct?

Yes and at QUORUM it's quasi local.  The job tracker finds out where a range is 
and sends a task to a replica with the data (local).  In the case of CL.QUORUM 
(see the Read Path section of 
http://wiki.apache.org/cassandra/ArchitectureInternals), it will do an actual 
read of the data on the node closest (local).  Then it will get a digest from 
other nodes to verify that they have the same data.  So in the case of RF=3 and 
QUORUM, it will read the data on the local node where the task is running and 
will check the next closest replica for a digest to verify that it is 
consistent.  Information is sent across the wire and there is the latency of 
that, but it's not the data that's sent.

 2. With CL QUORUM cassandra reads data from all replicas. In this case
 Hadoop doesn't give me any  benefits. Application running outside the
 cluster has the same performance. Is it correct?

CL QUORUM does not read data from all replicas.  Applications running outside 
the cluster have to copy the data from the cluster, a much more copy/network 
intensive operation than using CL.QUORUM with the built-in Hadoop support.

 
 Thank you,
  Andrey

Re: Cassandra nodes loaded unequally

2012-10-18 Thread Ben Kaehne

After some time. I believe this is correct. The load seems to be
correlated to compactions/number of files for keyspace/IO etc.

Thanks all!

Regards,

On Thu, Oct 18, 2012 at 9:35 PM, aaron morton aa...@thelastpickle.comwrote:

 At times of high load check the CPU % for the java service running C* to
 confirm C* is the source of load.

 If the load is generated from C* check the logs (or use OpsCentre / other
 monitoring) to see if it correlated to compaction, or Garbage Collection or
 repair or high throughput.

 Cheers

 -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com

 On 17/10/2012, at 12:22 PM, Ben Kaehne ben.kae...@sirca.org.au wrote:

 Nothing unusual.

 All servers are exactly the same. Nothing unusual in the log files. Is
 there any level of logging that I should be turning on?

 Regards,

 On Wed, Oct 17, 2012 at 9:51 AM, Andrey Ilinykh ailin...@gmail.comwrote:

 With your environment (3 nodes, RF=3) it is very difficult to get
 uneven load. Each node receives the same number of read/write
 requests. Probably something is wrong on low level, OS or VM. Do you
 see anything unusual in log files?

 Andrey

 On Tue, Oct 16, 2012 at 3:40 PM, Ben Kaehne ben.kae...@sirca.org.au
 wrote:
  Not connecting to the same node every time. Using Hector to ensure an
 even
  distribution of connections accross the cluster.
 
  Regards,
 
  On Sat, Oct 13, 2012 at 4:15 AM, B. Todd Burruss bto...@gmail.com
 wrote:
 
  are you connecting to the same node every time?  if so, spread out
  your connections across the ring
 
  On Fri, Oct 12, 2012 at 1:22 AM, Alexey Zotov azo...@griddynamics.com
 
  wrote:
   Hi Ben,
  
   I suggest you to compare amount of queries for each node. May be the
   problem
   is on the client side.
   Yoy can do that using JMX:
   org.apache.cassandra.db:type=ColumnFamilies,keyspace=YOUR
   KEYSPACE,columnfamily=YOUR CF,ReadCount
   org.apache.cassandra.db:type=ColumnFamilies,keyspace=YOUR
   KEYSPACE,columnfamily=YOUR CF,WriteCount
  
   Also I suggest to check output of nodetool compactionstats.
  
   --
   Alexey
  
  
 
 
 
 
  --
  -Ben




 --
 -Ben





-- 
-Ben

Re: hadoop consistency level

On Thu, Oct 18, 2012 at 2:31 PM, Jeremy Hanna
jeremy.hanna1...@gmail.com wrote:

 On Oct 18, 2012, at 3:52 PM, Andrey Ilinykh ailin...@gmail.com wrote:

 On Thu, Oct 18, 2012 at 1:34 PM, Michael Kjellman
 mkjell...@barracuda.com wrote:
 Not sure I understand your question (if there is one..)

 You are more than welcome to do CL ONE and assuming you have hadoop nodes
 in the right places on your ring things could work out very nicely. If you
 need to guarantee that you have all the data in your job then you'll need
 to use QUORUM.

 If you don't specify a CL in your job config it will default to ONE (at
 least that's what my read of the ConfigHelper source for 1.1.6 shows)

 I have two questions.
 1. I can benefit from data locality (and Hadoop) only with CL ONE. Is
 it correct?

 Yes and at QUORUM it's quasi local.  The job tracker finds out where a range 
 is and sends a task to a replica with the data (local).  In the case of 
 CL.QUORUM (see the Read Path section of 
 http://wiki.apache.org/cassandra/ArchitectureInternals), it will do an actual 
 read of the data on the node closest (local).  Then it will get a digest from 
 other nodes to verify that they have the same data.  So in the case of RF=3 
 and QUORUM, it will read the data on the local node where the task is running 
 and will check the next closest replica for a digest to verify that it is 
 consistent.  Information is sent across the wire and there is the latency of 
 that, but it's not the data that's sent.

 2. With CL QUORUM cassandra reads data from all replicas. In this case
 Hadoop doesn't give me any  benefits. Application running outside the
 cluster has the same performance. Is it correct?

 CL QUORUM does not read data from all replicas.  Applications running outside 
 the cluster have to copy the data from the cluster, a much more copy/network 
 intensive operation than using CL.QUORUM with the built-in Hadoop support.


Thank you very much, guys! I have a much clearer picture now.

Andrey

Re: replaced node keeps returning in gossip

 I replaced it with a new node, IP 10.16.128.197 and again token 0 with a 
 -Dcassandra.replace_token=0 at startup
Good Good. 

How long ago did you bring the new node on ? There is a fail safe to remove 
128.210 after 3 days if it does not gossip to other nodes. 

I *thought* that remove_token would remove the old IP from the ring. Can you 
post the output from nodetool gossipinfo from the 128.197 node ?

Thanks
 
-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 19/10/2012, at 2:44 AM, Thomas van Neerijnen t...@bossastudios.com wrote:

 Hi all
 
 I'm running Cassandra 1.0.11 on Ubuntu 11.10.
 
 I've got a ghost node which keeps showing up on my ring.
 
 A node living on IP 10.16.128.210 and token 0 died and had to be replaced.
 I replaced it with a new node, IP 10.16.128.197 and again token 0 with a 
 -Dcassandra.replace_token=0 at startup. This all went well but now I'm 
 seeing the following weirdness constantly reported in the log files around 
 the ring:
 
  INFO [GossipTasks:1] 2012-10-18 13:39:22,441 Gossiper.java (line 632) 
 FatClient /10.16.128.210 has been silent for 3ms, removing from gossip
  INFO [GossipStage:1] 2012-10-18 13:40:25,933 Gossiper.java (line 838) Node 
 /10.16.128.210 is now part of the cluster
  INFO [GossipStage:1] 2012-10-18 13:40:25,934 Gossiper.java (line 804) 
 InetAddress /10.16.128.210 is now UP
  INFO [GossipStage:1] 2012-10-18 13:40:25,937 StorageService.java (line 1017) 
 Nodes /10.16.128.210 and /10.16.128.197 have the same token 0.  Ignoring 
 /10.16.128.210
  INFO [GossipTasks:1] 2012-10-18 13:40:37,509 Gossiper.java (line 818) 
 InetAddress /10.16.128.210 is now dead.
  INFO [GossipTasks:1] 2012-10-18 13:40:56,526 Gossiper.java (line 632) 
 FatClient /10.16.128.210 has been silent for 3ms, removing from gossip

Re: UnreachableNodes