[jira] [Updated] (CASSANDRA-13162) Batchlog replay is throttled during bootstrap, creating conditions for incorrect query results on materialized views

2017-06-22 Thread Jeremy Hanna (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeremy Hanna updated CASSANDRA-13162:
-
Component/s: Materialized Views

> Batchlog replay is throttled during bootstrap, creating conditions for 
> incorrect query results on materialized views
> 
>
> Key: CASSANDRA-13162
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13162
> Project: Cassandra
>  Issue Type: Bug
>  Components: Materialized Views
>Reporter: Wei Deng
>Priority: Critical
>  Labels: bootstrap, materializedviews
>
> I've tested this in a C* 3.0 cluster with a couple of Materialized Views 
> defined (one base table and two MVs on that base table). The data volume is 
> not very high per node (about 80GB of data per node total, and that 
> particular base table has about 25GB of data uncompressed with one MV taking 
> 18GB compressed and the other MV taking 3GB), and the cluster is using decent 
> hardware (EC2 C4.8XL with 18 cores + 60GB RAM + 18K IOPS RAID0 from two 3TB 
> gp2 EBS volumes). 
> This is originally a 9-node cluster. It appears that after adding 3 more 
> nodes to the DC, the system.batches table accumulated a lot of data on the 3 
> new nodes (each having around 20GB under system.batches directory), and in 
> the subsequent week the batchlog on the 3 new nodes got slowly replayed back 
> to the rest of the nodes in the cluster. The bottleneck seems to be the 
> throttling defined in this cassandra.yaml setting: 
> batchlog_replay_throttle_in_kb, which by default is set to 1MB/s.
> Given that it is taking almost a week (and still hasn't finished) for the 
> batchlog (from MV) to be replayed after the boostrap finishes, it seems only 
> reasonable to unthrottle (or at least give it a much higher throttle rate) 
> during the initial bootstrap, and hence I'd consider this a bug for our 
> current MV implementation.
> Also as far as I understand, the bootstrap logic won't wait for the 
> backlogged batchlog to be fully replayed before changing the new 
> bootstrapping node to "UN" state, and if batchlog for the MVs got stuck in 
> this state for a long time, we basically will get wrong answers on the MVs 
> during that whole duration (until batchlog is fully played to the cluster), 
> which adds even more criticality to this bug.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-13162) Batchlog replay is throttled during bootstrap, creating conditions for incorrect query results on materialized views

2017-01-27 Thread Wei Deng (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Deng updated CASSANDRA-13162:
-
Description: 
I've tested this in a C* 3.0 cluster with a couple of Materialized Views 
defined (one base table and two MVs on that base table). The data volume is not 
very high per node (about 80GB of data per node total, and that particular base 
table has about 25GB of data uncompressed with one MV taking 18GB compressed 
and the other MV taking 3GB), and the cluster is using decent hardware (EC2 
C4.8XL with 18 cores + 60GB RAM + 18K IOPS RAID0 from two 3TB gp2 EBS volumes). 

This is originally a 9-node cluster. It appears that after adding 3 more nodes 
to the DC, the system.batches table accumulated a lot of data on the 3 new 
nodes (each having around 20GB under system.batches directory), and in the 
subsequent week the batchlog on the 3 new nodes got slowly replayed back to the 
rest of the nodes in the cluster. The bottleneck seems to be the throttling 
defined in this cassandra.yaml setting: batchlog_replay_throttle_in_kb, which 
by default is set to 1MB/s.

Given that it is taking almost a week (and still hasn't finished) for the 
batchlog (from MV) to be replayed after the boostrap finishes, it seems only 
reasonable to unthrottle (or at least give it a much higher throttle rate) 
during the initial bootstrap, and hence I'd consider this a bug for our current 
MV implementation.

Also as far as I understand, the bootstrap logic won't wait for the backlogged 
batchlog to be fully replayed before changing the new bootstrapping node to 
"UN" state, and if batchlog for the MVs got stuck in this state for a long 
time, we basically will get wrong answers on the MVs during that whole duration 
(until batchlog is fully played to the cluster), which adds even more 
criticality to this bug.

  was:
I've tested this in a C* 3.0 cluster with a couple of Materialized Views 
defined (one base table and two MVs on that base table). The data volume is not 
very high per node (about 80GB of data per node total, and that particular base 
table has about 25GB of data uncompressed with one MV taking 18GB compressed 
and the other MV taking 3GB), and the cluster is using decent hardware (EC2 
C4.8XL with 18 cores + 60GB RAM + 18K IOPS RAID0 from two 3TB gp2 EBS volumes). 

This is originally a 9-node cluster. It appears that after adding 3 more nodes 
to the DC, the system.batches table accumulated a lot of data on the 3 new 
nodes, and in the subsequent week the batchlog on the 3 new nodes got slowly 
replayed back to the rest of the nodes in the cluster. The bottleneck seems to 
be the throttling defined in this cassandra.yaml setting: 
batchlog_replay_throttle_in_kb, which by default is set to 1MB/s.

Given that it is taking almost a week (and still hasn't finished) for the 
batchlog (from MV) to be replayed after the boostrap finishes, it seems only 
reasonable to unthrottle (or at least give it a much higher throttle rate) 
during the initial bootstrap, and hence I'd consider this a bug for our current 
MV implementation.

Also as far as I understand, the bootstrap logic won't wait for the backlogged 
batchlog to be fully replayed before changing the new bootstrapping node to 
"UN" state, and if batchlog for the MVs got stuck in this state for a long 
time, we basically will get wrong answers on the MVs during that whole duration 
(until batchlog is fully played to the cluster), which adds even more 
criticality to this bug.


> Batchlog replay is throttled during bootstrap, creating conditions for 
> incorrect query results on materialized views
> 
>
> Key: CASSANDRA-13162
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13162
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Wei Deng
>Priority: Critical
>  Labels: bootstrap, materializedviews
>
> I've tested this in a C* 3.0 cluster with a couple of Materialized Views 
> defined (one base table and two MVs on that base table). The data volume is 
> not very high per node (about 80GB of data per node total, and that 
> particular base table has about 25GB of data uncompressed with one MV taking 
> 18GB compressed and the other MV taking 3GB), and the cluster is using decent 
> hardware (EC2 C4.8XL with 18 cores + 60GB RAM + 18K IOPS RAID0 from two 3TB 
> gp2 EBS volumes). 
> This is originally a 9-node cluster. It appears that after adding 3 more 
> nodes to the DC, the system.batches table accumulated a lot of data on the 3 
> new nodes (each having around 20GB under system.batches directory), and in 
> the subsequent week the batchlog on the 3 new nodes got slowly replayed back 
> to the rest of the nodes in the cluster. The bottleneck seems to 

[jira] [Updated] (CASSANDRA-13162) Batchlog replay is throttled during bootstrap, creating conditions for incorrect query results on materialized views

2017-01-27 Thread Wei Deng (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Deng updated CASSANDRA-13162:
-
Labels: bootstrap materializedviews  (was: )

> Batchlog replay is throttled during bootstrap, creating conditions for 
> incorrect query results on materialized views
> 
>
> Key: CASSANDRA-13162
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13162
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Wei Deng
>Priority: Critical
>  Labels: bootstrap, materializedviews
>
> I've tested this in a C* 3.0 cluster with a couple of Materialized Views 
> defined (one base table and two MVs on that base table). The data volume is 
> not very high per node (about 80GB of data per node total, and that 
> particular base table has about 25GB of data uncompressed with one MV taking 
> 18GB compressed and the other MV taking 3GB), and the cluster is using decent 
> hardware (EC2 C4.8XL with 18 cores + 60GB RAM + 18K IOPS RAID0 from two 3TB 
> gp2 EBS volumes). 
> This is originally a 9-node cluster. It appears that after adding 3 more 
> nodes to the DC, the system.batches table accumulated a lot of data on the 3 
> new nodes, and in the subsequent week the batchlog on the 3 new nodes got 
> slowly replayed back to the rest of the nodes in the cluster. The bottleneck 
> seems to be the throttling defined in this cassandra.yaml setting: 
> batchlog_replay_throttle_in_kb, which by default is set to 1MB/s.
> Given that it is taking almost a week (and still hasn't finished) for the 
> batchlog (from MV) to be replayed after the boostrap finishes, it seems only 
> reasonable to unthrottle (or at least give it a much higher throttle rate) 
> during the initial bootstrap, and hence I'd consider this a bug for our 
> current MV implementation.
> Also as far as I understand, the bootstrap logic won't wait for the 
> backlogged batchlog to be fully replayed before changing the new 
> bootstrapping node to "UN" state, and if batchlog for the MVs got stuck in 
> this state for a long time, we basically will get wrong answers on the MVs 
> during that whole duration (until batchlog is fully played to the cluster), 
> which adds even more criticality to this bug.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-13162) Batchlog replay is throttled during bootstrap, creating conditions for incorrect query results on materialized views

2017-01-27 Thread Wei Deng (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Deng updated CASSANDRA-13162:
-
Priority: Critical  (was: Major)

> Batchlog replay is throttled during bootstrap, creating conditions for 
> incorrect query results on materialized views
> 
>
> Key: CASSANDRA-13162
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13162
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Wei Deng
>Priority: Critical
>
> I've tested this in a C* 3.0 cluster with a couple of Materialized Views 
> defined (one base table and two MVs on that base table). The data volume is 
> not very high per node (about 80GB of data per node total, and that 
> particular base table has about 25GB of data uncompressed with one MV taking 
> 18GB compressed and the other MV taking 3GB), and the cluster is using decent 
> hardware (EC2 C4.8XL with 18 cores + 60GB RAM + 18K IOPS RAID0 from two 3TB 
> gp2 EBS volumes). 
> This is originally a 9-node cluster. It appears that after adding 3 more 
> nodes to the DC, the system.batches table accumulated a lot of data on the 3 
> new nodes, and in the subsequent week the batchlog on the 3 new nodes got 
> slowly replayed back to the rest of the nodes in the cluster. The bottleneck 
> seems to be the throttling defined in this cassandra.yaml setting: 
> batchlog_replay_throttle_in_kb, which by default is set to 1MB/s.
> Given that it is taking almost a week (and still hasn't finished) for the 
> batchlog (from MV) to be replayed after the boostrap finishes, it seems only 
> reasonable to unthrottle (or at least give it a much higher throttle rate) 
> during the initial bootstrap, and hence I'd consider this a bug for our 
> current MV implementation.
> Also as far as I understand, the bootstrap logic won't wait for the 
> backlogged batchlog to be fully replayed before changing the new 
> bootstrapping node to "UN" state, and if batchlog for the MVs got stuck in 
> this state for a long time, we basically will get wrong answers on the MVs 
> during that whole duration (until batchlog is fully played to the cluster), 
> which adds even more criticality to this bug.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-13162) Batchlog replay is throttled during bootstrap, creating conditions for incorrect query results on materialized views

2017-01-27 Thread Wei Deng (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Deng updated CASSANDRA-13162:
-
Summary: Batchlog replay is throttled during bootstrap, creating conditions 
for incorrect query results on materialized views  (was: Batchlog replay is 
throttled during bootstrap)

> Batchlog replay is throttled during bootstrap, creating conditions for 
> incorrect query results on materialized views
> 
>
> Key: CASSANDRA-13162
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13162
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Wei Deng
>
> I've tested this in a C* 3.0 cluster with a couple of Materialized Views 
> defined (one base table and two MVs on that base table). The data volume is 
> not very high per node (about 80GB of data per node total, and that 
> particular base table has about 25GB of data uncompressed with one MV taking 
> 18GB compressed and the other MV taking 3GB), and the cluster is using decent 
> hardware (EC2 C4.8XL with 18 cores + 60GB RAM + 18K IOPS RAID0 from two 3TB 
> gp2 EBS volumes). 
> This is originally a 9-node cluster. It appears that after adding 3 more 
> nodes to the DC, the system.batches table accumulated a lot of data on the 3 
> new nodes, and in the subsequent week the batchlog on the 3 new nodes got 
> slowly replayed back to the rest of the nodes in the cluster. The bottleneck 
> seems to be the throttling defined in this cassandra.yaml setting: 
> batchlog_replay_throttle_in_kb, which by default is set to 1MB/s.
> Given that it is taking almost a week (and still hasn't finished) for the 
> batchlog (from MV) to be replayed after the boostrap finishes, it seems only 
> reasonable to unthrottle (or at least give it a much higher throttle rate) 
> during the initial bootstrap, and hence I'd consider this a bug for our 
> current MV implementation.
> Also as far as I understand, the bootstrap logic won't wait for the 
> backlogged batchlog to be fully replayed before changing the new 
> bootstrapping node to "UN" state, and if batchlog for the MVs got stuck in 
> this state for a long time, we basically will get wrong answers on the MVs 
> during that whole duration (until batchlog is fully played to the cluster), 
> which adds even more criticality to this bug.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)