[jira] [Commented] (CASSANDRA-12905) Retry acquire MV lock on failure instead of throwing WTE on streaming

Benjamin Roth (JIRA) Mon, 28 Nov 2016 23:16:38 -0800

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-12905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15704474#comment-15704474
 ]


Benjamin Roth commented on CASSANDRA-12905:
-------------------------------------------

Just to show that there is still a HUGE bottleneck. I started the bootstrap at 
about 8:30 (nearly 24h back from now)

System load over time:
https://cl.ly/1U1f1U0L1o3T

CS load over time:
https://cl.ly/1z2U2a3p1r1m

The node is completely stuck in applying MV updates. Most of the time (when 
looking at thread traces in jconsole), the StreamReaders are busy reading from 
sstables.
I am eager to dig deeper on that but I'd really appreciate some assistance from 
some guy(s) who is really firm with the MV stuff. It takes me hours and days to 
get all the information needed to understand the system and get to a 
conclusion, to test it and so on. It would be very helpful to have someone who 
could guide me a little bit through thy "why" and "where".
E.g. I'd like to try out some changes in MV based streaming but I have know if 
there are tests that already allow to prove my thesis or do I have to write 
some new tests.


> Retry acquire MV lock on failure instead of throwing WTE on streaming
> ---------------------------------------------------------------------
>
>                 Key: CASSANDRA-12905
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-12905
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Streaming and Messaging
>         Environment: centos 6.7 x86_64
>            Reporter: Nir Zilka
>            Priority: Critical
>             Fix For: 3.9
>
>
> Hello,
> I performed two upgrades to the current cluster (currently 15 nodes, 1 DC, 
> private VLAN),
> first it was 2.2.5.1 and repair worked flawlessly,
> second upgrade was to 3.0.9 (with upgradesstables) and also repair worked 
> well,
> then i upgraded 2 weeks ago to 3.9 - and the repair problems started.
> there are several errors types from the system.log (different nodes) :
> - Sync failed between /xxx.xxx.xxx.xxx and /xxx.xxx.xxx.xxx
> - Streaming error occurred on session with peer xxx.xxx.xxx.xxx Operation 
> timed out - received only 0 responses
> - Remote peer xxx.xxx.xxx.xxx failed stream session
> - Session completed with the following error
> org.apache.cassandra.streaming.StreamException: Stream failed
> ----
> i use 3.9 default configuration with the cluster settings adjustments (3 
> seeds, GossipingPropertyFileSnitch).
> streaming_socket_timeout_in_ms is the default (86400000).
> i'm afraid from consistency problems while i'm not performing repair.
> Any ideas?
> Thanks,
> Nir.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-12905) Retry acquire MV lock on failure instead of throwing WTE on streaming

Reply via email to