Re: Transaction failed because of timeout, retry failed because of the first try actually succeeded.

2016-06-30 Thread Robert Wille
I had this problem, and it was caused by my retry policy. For reasons I don’t 
remember (but is documented in a C* Jira ticket), when onWriteTimeout() is 
called, you cannot call RetryDecision.retry(cl), as it will be a CL that is 
incompatible with LWT. After the fix (2.1.?), you can pass null, and it will 
use the original CL.

On Jun 30, 2016, at 6:11 PM, Justin Lin  wrote:

> Hi everyone,
> 
> I recently encountered a problem wrt light weight transaction. My query is to 
> insert a row to a table if the row doesn't exist. It goes like this:
> 
> Insert Into mytable (key, col1, col2) Value("key1", 1, 2) If Not Exist
> 
> My case is the driver somehow gets time out from waiting for coordinator to 
> response, but the transaction actually succeeded. So my code retry the query 
> and failed.
> 
> This is not an idempotent write, so the retry might be a bad idea. And 
> honestly this is not a cassandra issue. But i wonder if anyone in the 
> community ever had this problem before and how would you recommend to solve 
> it?
> 
> Thanks
> 
> -- 
> come on



Transaction failed because of timeout, retry failed because of the first try actually succeeded.

2016-06-30 Thread Justin Lin
Hi everyone,

I recently encountered a problem wrt light weight transaction. My query is
to insert a row to a table if the row doesn't exist. It goes like this:

Insert Into mytable (key, col1, col2) Value("key1", 1, 2) If Not Exist

My case is the driver somehow gets time out from waiting for coordinator to
response, but the transaction actually succeeded. So my code retry the
query and failed.

This is not an idempotent write, so the retry might be a bad idea. And
honestly this is not a cassandra issue. But i wonder if anyone in the
community ever had this problem before and how would you recommend to solve
it?

Thanks

-- 
come on


Re: Motivation for a DHT ring

2016-06-30 Thread Utkarsh Sengar
With fault tolerance and reliability, it also gives a faster lookup
mechanism across various nodes in a cluster.
Amazon's dynamo paper might be a better read to understand the reasoning
behind a DHT based system:
http://www.allthingsdistributed.com/files/amazon-dynamo-sosp2007.pdf

On Wed, Jun 29, 2016 at 11:48 PM, Jens Rantil  wrote:

> Some reasons I can come up with:
> - it would be hard to have tunable read/consistencies/replicas when
> interfacing with a file system.
> - data locality support would require strong coupling to the distributed
> file system interface (if at all possible given that certain sstables
> should live on the same data node).
> - operator complexity both administering a distributed file system as well
> as a Cassandra cluster. This was a personal reason why I chose Cassandra
> instead of HBase for a project.
>
> Cheers,
> Jens
>
> Den ons 29 juni 2016 13:01jean paul  skrev:
>
>>
>>
>> 2016-06-28 22:29 GMT+01:00 jean paul :
>>
>>> Hi all,
>>>
>>> Please, What is the motivation for choosing a DHT ring in cassandra? Why
>>> not use a normal parallel or distributed file system that supports
>>> replication?
>>>
>>> Thank you so much for clarification.
>>>
>>> Kind regards.
>>>
>>
>> --
>
> Jens Rantil
> Backend Developer @ Tink
>
> Tink AB, Wallingatan 5, 111 60 Stockholm, Sweden
> For urgent matters you can reach me at +46-708-84 18 32.
>



-- 
Thanks,
-Utkarsh


RE: C* files getting stuck

2016-06-30 Thread Amit Singh F
Hi Josh,

On which version are you facing this issue. Is it 2.0.x branch ?

Regards
Amit
From: Josh Smith [mailto:josh.sm...@careerbuilder.com]
Sent: Thursday, June 30, 2016 7:39 PM
To: user@cassandra.apache.org
Subject: RE: C* files getting stuck

I have also faced this issue.  Rebooting the instance has been our fix so far.  
I am very interested if anyone else has a solution.  I was unable to get a 
definitive answer from Datastax during the last Cassandra Summit.

From: Amit Singh F [mailto:amit.f.si...@ericsson.com]
Sent: Thursday, June 30, 2016 7:02 AM
To: user@cassandra.apache.org
Subject: RE: C* files getting stuck

Hi All,

Please check fi anybody has faced below issue and if yes what best can be done 
to avoid this.?
Thanks in advance.

Regards
Amit Singh

From: Amit Singh F [mailto:amit.f.si...@ericsson.com]
Sent: Wednesday, June 29, 2016 3:52 PM
To: user@cassandra.apache.org
Subject: C* files getting stuck


Hi All

We are running Cassandra 2.0.14 and disk usage is very high. On investigating 
it further we found that there are around 4-5 files(~ 150 GB) in stuck mode.

Command Fired : lsof /var/lib/cassandra | grep -i deleted

Output :

java 12158 cassandra 308r REG 8,16 34396638044 12727268 
/var/lib/cassandra/data/mykeyspace/mycolumnfamily/mykeyspace-mycolumnfamily-jb-16481-Data.db
 (deleted)
java 12158 cassandra 327r REG 8,16 101982374806 12715102 
/var/lib/cassandra/data/mykeyspace/mycolumnfamily/mykeyspace-mycolumnfamily-jb-126861-Data.db
 (deleted)
java 12158 cassandra 339r REG 8,16 12966304784 12714010 
/var/lib/cassandra/data/mykeyspace/mycolumnfamily/mykeyspace-mycolumnfamily-jb-213548-Data.db
 (deleted)
java 12158 cassandra 379r REG 8,16 15323318036 12714957 
/var/lib/cassandra/data/mykeyspace/mycolumnfamily/mykeyspace-mycolumnfamily-jb-182936-Data.db
 (deleted)

we are not able to see these files in any directory. This is somewhat similar 
to  https://issues.apache.org/jira/browse/CASSANDRA-6275 which is fixed but 
still issue is there on higher version. Also in logs no error related to 
compaction is reported.

so could any one of you please provide any suggestion how to counter this. 
Restarting Cassandra is one solution but this issue keeps on occurring so we 
cannot restart production machine is not recommended so frequently.

Also we know that this version is not supported but there is high probability 
that it can occur in higher version too.
Regards
Amit Singh


RE: C* files getting stuck

2016-06-30 Thread Josh Smith
I have also faced this issue.  Rebooting the instance has been our fix so far.  
I am very interested if anyone else has a solution.  I was unable to get a 
definitive answer from Datastax during the last Cassandra Summit.

From: Amit Singh F [mailto:amit.f.si...@ericsson.com]
Sent: Thursday, June 30, 2016 7:02 AM
To: user@cassandra.apache.org
Subject: RE: C* files getting stuck

Hi All,

Please check fi anybody has faced below issue and if yes what best can be done 
to avoid this.?
Thanks in advance.

Regards
Amit Singh

From: Amit Singh F [mailto:amit.f.si...@ericsson.com]
Sent: Wednesday, June 29, 2016 3:52 PM
To: user@cassandra.apache.org
Subject: C* files getting stuck


Hi All

We are running Cassandra 2.0.14 and disk usage is very high. On investigating 
it further we found that there are around 4-5 files(~ 150 GB) in stuck mode.

Command Fired : lsof /var/lib/cassandra | grep -i deleted

Output :

java 12158 cassandra 308r REG 8,16 34396638044 12727268 
/var/lib/cassandra/data/mykeyspace/mycolumnfamily/mykeyspace-mycolumnfamily-jb-16481-Data.db
 (deleted)
java 12158 cassandra 327r REG 8,16 101982374806 12715102 
/var/lib/cassandra/data/mykeyspace/mycolumnfamily/mykeyspace-mycolumnfamily-jb-126861-Data.db
 (deleted)
java 12158 cassandra 339r REG 8,16 12966304784 12714010 
/var/lib/cassandra/data/mykeyspace/mycolumnfamily/mykeyspace-mycolumnfamily-jb-213548-Data.db
 (deleted)
java 12158 cassandra 379r REG 8,16 15323318036 12714957 
/var/lib/cassandra/data/mykeyspace/mycolumnfamily/mykeyspace-mycolumnfamily-jb-182936-Data.db
 (deleted)

we are not able to see these files in any directory. This is somewhat similar 
to  https://issues.apache.org/jira/browse/CASSANDRA-6275 which is fixed but 
still issue is there on higher version. Also in logs no error related to 
compaction is reported.

so could any one of you please provide any suggestion how to counter this. 
Restarting Cassandra is one solution but this issue keeps on occurring so we 
cannot restart production machine is not recommended so frequently.

Also we know that this version is not supported but there is high probability 
that it can occur in higher version too.
Regards
Amit Singh


RE: Exception in logs using LCS .

2016-06-30 Thread Prakash Chauhan
Hi  Paulo,

Thanks for the reply. Running scrub every time the exception occurs is not a 
possible solution for us. We don’t have a log patrolling system in place that 
verify the log on regular basis.

I have some queries based on your reply:

1.   You mentioned race condition. Can you please elaborate a little more 
what actual race condition may be happening in this scenario ?

2.   What if this happens in production ? What would be the impact? Can we 
live with it?

3.   Test system where the problem was reported has been scrapped so we 
can’t run user defined compaction now. Any suggestions to reproduce it on a new 
system?



From: Paulo Motta [mailto:pauloricard...@gmail.com]
Sent: Tuesday, June 28, 2016 5:43 PM
To: user@cassandra.apache.org
Subject: Re: Exception in logs using LCS .

1. Not necessarily data corruption, but it seems compaction is trying to write 
data in the wrong order most likely due to a temporary race condition/bug a la 
#9935, but since the compaction fails your original data is probably safe (you 
can try running scrub to verify/fix corruptions).
2. This is pretty tricky to reproduce because it will depend on which sstables 
were picked for compaction at a particular instant, but you could try running a 
user-defined compaction or scrub on the sstables that contain this key, see 
CASSANDRA-11337and https://gist.github.com/jeromatron/e238e5795b3e79866b83
3. clone the cassandra repository of your current version, git-cherry-pick the 
commit of CASSANDRA-9935, ant jar, replace the cassandra jar with the generated 
SNAPSHOT.jar and restart the node`

2016-06-28 7:55 GMT-03:00 Prakash Chauhan 
>:
Hello,

Recently we changed compaction strategy fora table to LCS from the default 
STCS. While bootstrapping a node , we are getting following Exception in the 
logs:

ERROR [CompactionExecutor:81] 2016-05-11 13:48:54,694 CassandraDaemon.java 
(line 258) Exception in thread Thread[CompactionExecutor:81,1,main]
java.lang.RuntimeException: Last written key DecoratedKey(-2711050270696519088, 
623330333a313a35) >= current key DecoratedKey(-8631371593982690738, 
623437393a313a30) writing into /cassandra/data/main/myCF/myCF-tmp-jb-326-Data.db
at 
org.apache.cassandra.io.sstable.SSTableWriter.beforeAppend(SSTableWriter.java:143)
at 
org.apache.cassandra.io.sstable.SSTableWriter.append(SSTableWriter.java:166)
at 
org.apache.cassandra.db.compaction.CompactionTask.runMayThrow(CompactionTask.java:167)
at 
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
at 
org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:60)
at 
org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:59)
at 
org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionTask.run(CompactionManager.java:198)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
INFO [CompactionExecutor:68] 2016-05-11 14:10:28,957 ColumnFamilyStore.java 
(line 795) Enqueuing flush of Memtable-compactions_in_progress@541886922(0/0 
serialized/live bytes, 1 ops)

Questions:

1.   Are these exceptions due to data corruption ?

2.   I am unable to reproduce the problem. How can I reproduce the 
Exception? Is there any specific case when such exceptions are raised?

3.   Without reproducing the Exception, how can I test the patch available 
at related JIRA : https://issues.apache.org/jira/browse/CASSANDRA-9935



Thanks,
Prakash Chauhan.










Re: Ring connection timeouts with 2.2.6

2016-06-30 Thread Jens Rantil
Hi,

Could it be garbage collection occurring on nodes that are more heavily
loaded?

Cheers,
Jens

Den sön 26 juni 2016 05:22Mike Heffner  skrev:

> One thing to add, if we do a rolling restart of the ring the timeouts
> disappear entirely for several hours and performance returns to normal.
> It's as if something is leaking over time, but we haven't seen any
> noticeable change in heap.
>
> On Thu, Jun 23, 2016 at 10:38 AM, Mike Heffner  wrote:
>
>> Hi,
>>
>> We have a 12 node 2.2.6 ring running in AWS, single DC with RF=3, that is
>> sitting at <25% CPU, doing mostly writes, and not showing any particular
>> long GC times/pauses. By all observed metrics the ring is healthy and
>> performing well.
>>
>> However, we are noticing a pretty consistent number of connection
>> timeouts coming from the messaging service between various pairs of nodes
>> in the ring. The "Connection.TotalTimeouts" meter metric show 100k's of
>> timeouts per minute, usually between two pairs of nodes for several hours
>> at a time. It seems to occur for several hours at a time, then may stop or
>> move to other pairs of nodes in the ring. The metric
>> "Connection.SmallMessageDroppedTasks." will also grow for one pair of
>> the nodes in the TotalTimeouts metric.
>>
>> Looking at the debug log typically shows a large number of messages like
>> the following on one of the nodes:
>>
>> StorageProxy.java:1033 - Skipped writing hint for /172.26.33.177 (ttl 0)
>>
>> We have cross node timeouts enabled, but ntp is running on all nodes and
>> no node appears to have time drift.
>>
>> The network appears to be fine between nodes, with iperf tests showing
>> that we have a lot of headroom.
>>
>> Any thoughts on what to look for? Can we increase thread count/pool sizes
>> for the messaging service?
>>
>> Thanks,
>>
>> Mike
>>
>> --
>>
>>   Mike Heffner 
>>   Librato, Inc.
>>
>>
>
>
> --
>
>   Mike Heffner 
>   Librato, Inc.
>
> --

Jens Rantil
Backend Developer @ Tink

Tink AB, Wallingatan 5, 111 60 Stockholm, Sweden
For urgent matters you can reach me at +46-708-84 18 32.


Re: some questions

2016-06-30 Thread Jens Rantil
You forgot FROM in your CQL query.

Jens

Den sön 26 juni 2016 08:30lowping  skrev:

> Hi :
>
>
> question 1:
>
> I got a error about this cql, have you fix it already ???
> select collection_type where id in (‘a’,’b’)
>
> question 2:
>
> I want use UDF in update, but this cql can’t execute.  have some advise???
>
> update table_name set field=my_function(field) where …
>
>
> tnk u so much
>
-- 

Jens Rantil
Backend Developer @ Tink

Tink AB, Wallingatan 5, 111 60 Stockholm, Sweden
For urgent matters you can reach me at +46-708-84 18 32.


UNSUBSCRIBE

2016-06-30 Thread Brian Fleming



Re: Multi DC setup question

2016-06-30 Thread Jens Rantil
I'm AFK, but you might be able to query the system.peers table to see which
nodes are up.

Cheers,
Jens

Den tis 28 juni 2016 06:44Charulata Sharma (charshar) 
skrev:

> Hi All,
>
>We are setting up another Data Center and have the following
> question:
>
> 6 nodes in each DC Cassandra cluster.
>
> All key spaces have an RF of 3
>
> *Our scenario is *
>
>
>
> Apps node connect to Cassandra cluster using LOCAL_QUORUM consistency.
>
>
>
> We want to ensure that If 5 nodes out of the 6 are available then
> application enters the primary DC else the application URL be directed to
> another DC.
>
>
>
> What is the best option to achieve this??
>
>
>
> Thanks,
>
> Charu
>
>
>
>
>
>
>
>
>
> --

Jens Rantil
Backend Developer @ Tink

Tink AB, Wallingatan 5, 111 60 Stockholm, Sweden
For urgent matters you can reach me at +46-708-84 18 32.


Re: Motivation for a DHT ring

2016-06-30 Thread Jens Rantil
Some reasons I can come up with:
- it would be hard to have tunable read/consistencies/replicas when
interfacing with a file system.
- data locality support would require strong coupling to the distributed
file system interface (if at all possible given that certain sstables
should live on the same data node).
- operator complexity both administering a distributed file system as well
as a Cassandra cluster. This was a personal reason why I chose Cassandra
instead of HBase for a project.

Cheers,
Jens

Den ons 29 juni 2016 13:01jean paul  skrev:

>
>
> 2016-06-28 22:29 GMT+01:00 jean paul :
>
>> Hi all,
>>
>> Please, What is the motivation for choosing a DHT ring in cassandra? Why
>> not use a normal parallel or distributed file system that supports
>> replication?
>>
>> Thank you so much for clarification.
>>
>> Kind regards.
>>
>
> --

Jens Rantil
Backend Developer @ Tink

Tink AB, Wallingatan 5, 111 60 Stockholm, Sweden
For urgent matters you can reach me at +46-708-84 18 32.