Re: constant CMS GC using CPU time

2012-10-21 Thread Radim Kolar

Dne 18.10.2012 20:06, Bryan Talbot napsal(a):
In a 4 node cluster running Cassandra 1.1.5 with sun jvm 1.6.0_29-b11 
(64-bit), the nodes are often getting stuck in state where CMS 
collections of the old space are constantly running.

you need more java heap memory


Re: What does ReadRepair exactly do?

2012-10-21 Thread aaron morton
There are two processes in cassandra that trigger Read Repair like behaviour. 

During a DigestMismatchException is raised if the responses from the replicas 
do not match. In this case another read is run that involves reading all the 
data. This is the CL level agreement kicking in. 

The other Read Repair is the one controlled by the read_repair_chance. When 
RR is active on a request ALL up replicas are involved in the read. When RR is 
not active only CL replicas are involved. When test for CL agreement occurs 
synchronously to the request; the RR check waits asynchronously to the request 
for all nodes in the request to return. It then checks for consistency and 
repairs differences. 

 From looking at the source code, I do not understand how this set is built 
 and I do not understand how the reconciliation is executed.
When a DigestMismatch is detected a read is run using RepairCallback. The 
callback will call the RowRepairResolver.resolve() when enough responses have 
been collected. 

resolveSuperset() picks one response to the baseline, and then calls delete() 
to apply row level deletes from the other responses (ColumnFamily's). It 
collects the other CF's into an iterator with a filter that returns all 
columns. The columns are then applied to the baseline CF which may result in 
reconcile() being called. 

reconcile() is used when a AbstractColumnContainer has two versions of a column 
and it wants to only have one. 

RowRepairResolve.scheduleRepairs() works out the delta for each node by calling 
ColumnFamily.diff(). The delta is then sent to the appropriate node.


Hope that helps. 


-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 19/10/2012, at 6:33 AM, Markus Klems markuskl...@gmail.com wrote:

 Hi guys,
 
 I am looking through the Cassandra source code in the github trunk to better 
 understand how Cassandra's fault-tolerance mechanisms work. Most things make 
 sense. I am also aware of the wiki and DataStax documentation. However, I do 
 not understand what read repair does in detail. The method 
 RowRepairResolver.resolveSuperset(IterableColumnFamily versions) seems to 
 do the trick of merging conflicting versions of column family replicas and 
 builds the set of columns that need to be repaired. From looking at the 
 source code, I do not understand how this set is built and I do not 
 understand how the reconciliation is executed. ReadRepair does not seem to 
 trigger a Column.reconcile() to reconcile conflicting column versions on 
 different servers. Does it?
 
 If this is not what read repair does, then: What kind of inconsistencies are 
 resolved by read repair? And: How are the inconsistencies resolved?
 
 Could someone give me a hint?
 
 Thanks so much,
 
 -Markus



Re: Hinted Handoff runs every ten minutes

2012-10-21 Thread aaron morton
I *think* this may be ghost rows which have not being compacted.

How many SSTables are on disk for the HintedHandoff CF ?

Cheers

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 19/10/2012, at 7:16 AM, David Daeschler david.daesch...@gmail.com wrote:

 Hi Steve,
 
 Also confirming this. After having a node go down on Cassandra 1.0.8
 there seems to be hinted handoff between two of our 4 nodes every 10
 minutes. Our setup also shows 0 rows. It does not appear to have any
 effect on the operation of the ring, just fills up the log files.
 
 - David
 
 
 
 On Thu, Oct 18, 2012 at 2:10 PM, Stephen Pierce spie...@verifyle.com wrote:
 I installed Cassandra on three nodes. I then ran a test suite against them
 to generate load. The test suite is designed to generate the same type of
 load that we plan to have in production. As one of many tests, I reset one
 of the nodes to check the failure/recovery modes.  Cassandra worked just
 fine.
 
 
 
 I stopped the load generation, and got distracted with some other
 project/problem. A few days later, I noticed something strange on one of the
 nodes. On this node hinted handoff starts every ten minutes, and while it
 seems to finish without any errors, it will be started again in ten minutes.
 None of the nodes has any traffic, and hasn’t for several days. I checked
 the logs, and this goes back to the initial failure/recovery testing:
 
 
 
 INFO [HintedHandoff:1] 2012-10-18 10:19:26,618 HintedHandOffManager.java
 (line 294) Started hinted handoff for token:
 113427455640312821154458202477256070484 with IP: /192.168.128.136
 
 INFO [HintedHandoff:1] 2012-10-18 10:19:26,779 HintedHandOffManager.java
 (line 390) Finished hinted handoff of 0 rows to endpoint /192.168.128.136
 
 INFO [HintedHandoff:1] 2012-10-18 10:29:26,622 HintedHandOffManager.java
 (line 294) Started hinted handoff for token:
 113427455640312821154458202477256070484 with IP: /192.168.128.136
 
 INFO [HintedHandoff:1] 2012-10-18 10:29:26,735 HintedHandOffManager.java
 (line 390) Finished hinted handoff of 0 rows to endpoint /192.168.128.136
 
 INFO [HintedHandoff:1] 2012-10-18 10:39:26,624 HintedHandOffManager.java
 (line 294) Started hinted handoff for token:
 113427455640312821154458202477256070484 with IP: /192.168.128.136
 
 INFO [HintedHandoff:1] 2012-10-18 10:39:26,751 HintedHandOffManager.java
 (line 390) Finished hinted handoff of 0 rows to endpoint /192.168.128.136
 
 
 
 The other nodes are happy and don’t show this behavior. All the test data is
 readable, and everything is fine, but I’m curious why hinted handoff is
 running on one node all the time.
 
 
 
 I searched the bug database, and I found a bug that seems to have the same
 symptoms:
 
 https://issues.apache.org/jira/browse/CASSANDRA-3733
 
 Although it’s been marked fixed in 0.6, this describes my problem exactly.
 
 
 
 I’m running Cassandra 1.1.5 from Datastax on Centos 6.0:
 
 http://rpm.datastax.com/community/noarch/apache-cassandra11-1.1.5-1.noarch.rpm
 
 
 
 Is anyone else seeing this behavior? What can I do to provide more
 information?
 
 
 
 Steve
 
 
 
 
 
 -- 
 David Daeschler



Re: Compound primary key: Insert after delete

2012-10-21 Thread aaron morton
Yes AFAIK. 

Cheers

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 20/10/2012, at 12:15 AM, Vivek Mishra mishra.v...@gmail.com wrote:

 Hi,
 Is it possible to reuse same compound primary key after delete? I guess it 
 works fine for non composite keys.
 
 -Vivek



Re: DELETE query failing in CQL 3.0

2012-10-21 Thread aaron morton
Can you paste the table definition ? 

Thanks

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 20/10/2012, at 5:53 AM, Ryabin, Thomas thomas.rya...@mckesson.com wrote:

 I have a column family called “books”, and am trying to delete all rows where 
 the “title” column is equal to “hatchet”. This is the query I am using:
 DELETE FROM books WHERE title = ‘hatchet’;
  
 This query is failing with this error:
 Bad Request: PRIMARY KEY part title found in SET part
  
 I am using Cassandra 1.1 and CQL 3.0. What could be the problem?
  
 -Thomas



find smallest counter

2012-10-21 Thread Paul Loy
I have a set of categories. In these categories I want to add groups of
users. If a user does not specify the group they want to join in a
category, I want to add them to the least subscribed group.

So the groups are a super column with CategoryId as key, GroupId as
superColumnName and then columns for the group members.

Then I was planning on having some counters so I could keep track of the
group sizes. I figured I'd have a counter column for the groups. The key
being the CategoryId, then counter columns named by the GroupId. But I
can't figure out how to grab just the smallest group.

Many thanks in advance,

Paul.

-- 
-
Paul Loy
p...@keteracel.com
http://uk.linkedin.com/in/paulloy


Re: DELETE query failing in CQL 3.0

2012-10-21 Thread wang liang
It is better to provide table definition. I guess the reason is below
statement.
 a table must define at least one column that is not part of the PRIMARY
KEY as a row exists in Cassandra only if it contains at least one value for
one such column 
Please check this document
herehttp://cassandra.apache.org/doc/cql3/CQL.html#createKeyspaceStmt
.

On Mon, Oct 22, 2012 at 7:53 AM, aaron morton aa...@thelastpickle.comwrote:

 Can you paste the table definition ?

 Thanks

   -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com

 On 20/10/2012, at 5:53 AM, Ryabin, Thomas thomas.rya...@mckesson.com
 wrote:

 I have a column family called “books”, and am trying to delete all rows
 where the “title” column is equal to “hatchet”. This is the query I am
 using:
 DELETE FROM books WHERE title = ‘hatchet’;
 ** **
 This query is failing with this error:
 Bad Request: PRIMARY KEY part title found in SET part
 ** **
 I am using Cassandra 1.1 and CQL 3.0. What could be the problem?
 ** **
 -Thomas





-- 
Best wishes,
Helping others is to help myself.