[
https://issues.apache.org/jira/browse/CASSANALYTICS-72?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Liu Cao updated CASSANALYTICS-72:
---------------------------------
Description:
Currently, cassandra analytics is built and test against spark 3.2.2
Upgrade to spark 3.5 as a first step to ensure we are up-to-date with supported
Spark versions [https://endoflife.date/apache-spark]
Upgrade to spark 4 is tracked separately in
https://issues.apache.org/jira/browse/CASSANALYTICS-58
One noteworthy issue is that datasource v2 added a new partition scheme in
spark 3.3 (to support storage partition join)
https://issues.apache.org/jira/browse/SPARK-37377
And now we are seeing warnings like
```
25/06/20 16:52:19 WARN V2ScanPartitioningAndOrdering: Spark ignores the
partitioning CassandraPartitioning. Please use KeyGroupedPartitioning for
better performance
```
This will not be a build failure. But it will likely have performance
implication since spark now treat it as UnknownPartitioning, hence calling it
out here as well.
was:
Currently, cassandra analytics is built and test against spark 3.2.2
Upgrade to spark 3.5 as a first step to ensure we are up-to-date with supported
Spark versions.
Upgrade to spark 4 is tracked separately in
https://issues.apache.org/jira/browse/CASSANALYTICS-58
One interesting one is that datasource v2 added a new partition scheme in spark
3.3 (to support storage partition join)
https://issues.apache.org/jira/browse/SPARK-37377
And now we are seeing warnings like
```
25/06/20 16:52:19 WARN V2ScanPartitioningAndOrdering: Spark ignores the
partitioning CassandraPartitioning. Please use KeyGroupedPartitioning for
better performance
```
This will not be a build failure. But it will likely have performance
implication since spark now treat it as UnknownPartitioning, hence calling it
out here as well.
> Spark 3.5 support for cassandra analytics
> -----------------------------------------
>
> Key: CASSANALYTICS-72
> URL: https://issues.apache.org/jira/browse/CASSANALYTICS-72
> Project: Apache Cassandra Analytics
> Issue Type: Improvement
> Reporter: Liu Cao
> Priority: Normal
>
> Currently, cassandra analytics is built and test against spark 3.2.2
>
> Upgrade to spark 3.5 as a first step to ensure we are up-to-date with
> supported Spark versions [https://endoflife.date/apache-spark]
>
> Upgrade to spark 4 is tracked separately in
> https://issues.apache.org/jira/browse/CASSANALYTICS-58
>
>
> One noteworthy issue is that datasource v2 added a new partition scheme in
> spark 3.3 (to support storage partition join)
> https://issues.apache.org/jira/browse/SPARK-37377
> And now we are seeing warnings like
> ```
> 25/06/20 16:52:19 WARN V2ScanPartitioningAndOrdering: Spark ignores the
> partitioning CassandraPartitioning. Please use KeyGroupedPartitioning for
> better performance
> ```
> This will not be a build failure. But it will likely have performance
> implication since spark now treat it as UnknownPartitioning, hence calling it
> out here as well.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]