[GitHub] [iceberg] wypoon opened a new pull request #2512: Spark: Support building against both Spark 3.0 and Spark 3.1.

GitBox Sun, 25 Apr 2021 19:11:15 -0700


wypoon opened a new pull request #2512:
URL: https://github.com/apache/iceberg/pull/2512



   Code changes that allow spark3 and spark3-extensions to be built against 
both Spark 3.0 and Spark 3.1.
   - A method from `org.apache.spark.sql.catalyst.util.DateTimeUtils` that has 
changed its name is copied to `org.apache.iceberg.util.DateTimeUtil` and the 
new method is used instead.
   - The trait, `org.apache.spark.sql.catalyst.plans.logical.V2WriteCommand`, 
has 3 additional methods that need to be implemented. They are implemented in 
`ReplaceData` but without `override`.
   - The trait, `org.apache.spark.sql.catalyst.parser.ParserInterface`, no 
longer has the `parseRawDataType` method, so `IcebergSparkSqlExtensionsParser` 
implements it without `override` and the delegation checks to see if the method 
is defined in the delegate.
   - The main constructor for 
`org.apache.spark.sql.catalyst.expressions.SortOrder` has changed its 
signature. Use reflection to determine how to create instances of `SortOrder`.
   - The constructor for 
`org.apache.spark.sql.execution.datasources.v2.DataSourceV2ScanRelation` has 
changed its signature. Use reflection to determine how to create instances of 
`DataSourceV2ScanRelation`.
   - `org.apache.spark.sql.catalyst.SQLConfHelper` was introduced in Spark 3.1 
and a number of classes and traits now extend it, including 
`org.apache.spark.sql.catalyst.analysis.CastSupport`, which has it as a self 
type. This is the trickiest part. I move the mixin, `CastSupport` from 
`AssignmentAlignmentSupport` to the rule, `AlignRowLevelOperations`, since 
`org.apache.spark.sql.catalyst.rules.Rule` implements  `SQLConfHelper`. I 
define the `conf` method in the traits `AssignmentAlignmentSupport` and 
`RewriteRowLevelOperationHelper`, so that it can be overridden in the classes 
that extend them, which also extend `Rule[LogicalPlan]` (and thus 
`SQLConfHelper` in Spark 3.1). When compiling with Spark 3.0, the `conf` in the 
Iceberg traits are overridden, and when compiling with Spark 3.1, the `conf` in 
`SQLConfHelper` is overridden.
   
   I have not changed the Spark 3 version here. I am open to suggestions on how 
we want to do this. I am unfamiliar with gradle. With maven, I would define 
profiles so the Spark 3 support can be built with either Spark 3.0 or Spark 
3.1. I have tested this change locally by building with both Spark 3.0 and 
Spark 3.1 and running the unit tests in the spark3 and spark3-extensions 
modules in both cases. I have also tried using the Spark 3 runtime jar built 
against Spark 3.0 with a Spark 3.1 cluster, but only running a couple Spark 3 
procedures, so the testing is far from comprehensive. I am not sure if we need 
a Spark 3.1 runtime jar for Spark 3.1 clusters for it to be safe.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] wypoon opened a new pull request #2512: Spark: Support building against both Spark 3.0 and Spark 3.1.

Reply via email to