[ 
https://issues.apache.org/jira/browse/CASSANALYTICS-72?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liu Cao updated CASSANALYTICS-72:
---------------------------------
    Description: 
Currently, cassandra analytics is built and test against spark 3.2.2

 

Upgrade to spark 3.5 as a first step to ensure we are up-to-date with supported 
Spark versions [https://endoflife.date/apache-spark] 

 

Upgrade to spark 4 is tracked separately in 
https://issues.apache.org/jira/browse/CASSANALYTICS-58

 

 

One noteworthy issue is that datasource v2 added a new partition scheme in 
spark 3.3 (to support storage partition join) 
https://issues.apache.org/jira/browse/SPARK-37377 

And now we are seeing warnings like

```

25/06/20 16:52:19 WARN V2ScanPartitioningAndOrdering: Spark ignores the 
partitioning CassandraPartitioning. Please use KeyGroupedPartitioning for 
better performance

```

This will not be a build failure. But it will likely have performance 
implication since spark now treat it as UnknownPartitioning, hence calling it 
out here as well.

  was:
Currently, cassandra analytics is built and test against spark 3.2.2

 

Upgrade to spark 3.5 as a first step to ensure we are up-to-date with supported 
Spark versions.

 

Upgrade to spark 4 is tracked separately in 
https://issues.apache.org/jira/browse/CASSANALYTICS-58

 

 

One interesting one is that datasource v2 added a new partition scheme in spark 
3.3 (to support storage partition join) 
https://issues.apache.org/jira/browse/SPARK-37377 

And now we are seeing warnings like

```

25/06/20 16:52:19 WARN V2ScanPartitioningAndOrdering: Spark ignores the 
partitioning CassandraPartitioning. Please use KeyGroupedPartitioning for 
better performance

```

This will not be a build failure. But it will likely have performance 
implication since spark now treat it as UnknownPartitioning, hence calling it 
out here as well.


> Spark 3.5 support for cassandra analytics
> -----------------------------------------
>
>                 Key: CASSANALYTICS-72
>                 URL: https://issues.apache.org/jira/browse/CASSANALYTICS-72
>             Project: Apache Cassandra Analytics
>          Issue Type: Improvement
>            Reporter: Liu Cao
>            Priority: Normal
>
> Currently, cassandra analytics is built and test against spark 3.2.2
>  
> Upgrade to spark 3.5 as a first step to ensure we are up-to-date with 
> supported Spark versions [https://endoflife.date/apache-spark] 
>  
> Upgrade to spark 4 is tracked separately in 
> https://issues.apache.org/jira/browse/CASSANALYTICS-58
>  
>  
> One noteworthy issue is that datasource v2 added a new partition scheme in 
> spark 3.3 (to support storage partition join) 
> https://issues.apache.org/jira/browse/SPARK-37377 
> And now we are seeing warnings like
> ```
> 25/06/20 16:52:19 WARN V2ScanPartitioningAndOrdering: Spark ignores the 
> partitioning CassandraPartitioning. Please use KeyGroupedPartitioning for 
> better performance
> ```
> This will not be a build failure. But it will likely have performance 
> implication since spark now treat it as UnknownPartitioning, hence calling it 
> out here as well.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to