[ 
https://issues.apache.org/jira/browse/CASSANALYTICS-20?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Williams moved CASSANDRA-20558 to CASSANALYTICS-20:
-----------------------------------------------------------

    Component/s:     (was: Analytics Library)
        Impacts:   (was: None)
            Key: CASSANALYTICS-20  (was: CASSANDRA-20558)
       Platform:   (was: All)
        Project: Apache Cassandra Analytics  (was: Apache Cassandra)

> CassandraDataLayer uses configuration list of IPs instead of the full 
> ring/datacenter
> -------------------------------------------------------------------------------------
>
>                 Key: CASSANALYTICS-20
>                 URL: https://issues.apache.org/jira/browse/CASSANALYTICS-20
>             Project: Apache Cassandra Analytics
>          Issue Type: Bug
>            Reporter: Serban Teodorescu
>            Assignee: Serban Teodorescu
>            Priority: Normal
>
> In 
> [https://github.com/apache/cassandra-analytics/blob/trunk/cassandra-analytics-core/src/main/java/org/apache/cassandra/spark/data/CassandraDataLayer.java#L418-L435]
>  instanceMap is built based on clusterConfig:
> {code:java}
> instanceMap = 
> clusterConfig.stream().collect(Collectors.toMap(SidecarInstance::hostname, 
> Function.identity()));
> {code}
> If the sidecarContactPoints has a list of all Cassandra IPs there will be no 
> impact. Probably it will still work if the existing ones can ensure quorum. 
> But if not, it will fail with the following error:
> {code}
> 25/04/10 16:58:03 WARN TaskSetManager: Lost task 36.0 in stage 0.0 (TID 36) 
> (ip-10-218-174-202.ec2.internal executor 4): 
> org.apache.cassandra.spark.data.partitioner.NotEnoughReplicasException: 
> Required 2 replicas but only 0 responded
>         at 
> org.apache.cassandra.spark.data.partitioner.MultipleReplicas.openAll(MultipleReplicas.java:101)
> {code}
> I no longer have the executor error and stacktrace, but the instanceMap is 
> used in listInstance(); MultipleReplicas.openReplicaOrRetry is indirectly 
> using this.
> Snapshot creation at createSnapshot() is not using this, but the RingResponse 
> information:
> {code:java}
> ring.stream().filter(...)
> {code}
> So the snapshot is created on all nodes. On the other hand, clearSnapshot() 
> uses clusterConfig, so we end up with snapshots that are not deleted, so that 
> needs to be fixed as well.
> I'm working on a fix, I'll add a PR this week.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to