wombatu-kun commented on PR #12772: URL: https://github.com/apache/hudi/pull/12772#issuecomment-2947856383
> And all these complexities are brought with only Spark **4.0.0-preview1** version, but with released Spark **4.0.0** the situation becomes even worse because there are lots of breaking changes: many often-used classes were moved to different package (e.g. `SparkSession`, `SQLContext`, `Dataset` that are used in Hudi now locate in `org.apache.spark.sql.classic` package), new args were added to some constructors or unapply methods (e.g. `LogicalRDD`, `LogicalRelation`) etc. These changed classes that are the basic APIs for integration with Spark are frequently used even in `hudi-spark-client` (fundamental common module for all Spark versions). A bit more details about the changes that we have to make in Hudi while switching Spark dependencies from 4.0.0-preview1 to 4.0.0: for `hudi-spark-client` to compile with Spark 4.0.0 dependencies we need to change ~30 files in this module (mostly, fixing imports of SparkSession, SQLContext, DataSet and DataFrame classes from `org.apache.spark.sql` to `org.apache.spark.sql.classic`). So, if we want these classes to compile with both Spark3,x and Spark4 (and don't want to make `hudi-spark4.0.x` separated and self-contained), we have to move them (without changes) to `hudi-spark3-common`, copy them (with changed imports) to `hudi-spark4.0.x`, and add ~30 methods to SparkAdapter to work with these classes depending on Spark version. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
