wombatu-kun commented on PR #12772:
URL: https://github.com/apache/hudi/pull/12772#issuecomment-2947856383

   > And all these complexities are brought with only Spark **4.0.0-preview1** 
version, but with released Spark **4.0.0** the situation becomes even worse 
because there are lots of breaking changes: many often-used classes were moved 
to different package (e.g. `SparkSession`, `SQLContext`, `Dataset` that are 
used in Hudi now locate in `org.apache.spark.sql.classic` package), new args 
were added to some constructors or unapply methods (e.g. `LogicalRDD`, 
`LogicalRelation`) etc. These changed classes that are the basic APIs for 
integration with Spark are frequently used even in `hudi-spark-client` 
(fundamental common module for all Spark versions). 
   
   A bit more details about the changes that we have to make in Hudi while 
switching Spark dependencies from 4.0.0-preview1 to 4.0.0: for 
`hudi-spark-client` to compile with Spark 4.0.0 dependencies we need to change 
~30 files in this module (mostly, fixing imports of SparkSession, SQLContext, 
DataSet and DataFrame classes from `org.apache.spark.sql` to 
`org.apache.spark.sql.classic`).  
   So, if we want these classes to compile with both Spark3,x and Spark4 (and 
don't want to make `hudi-spark4.0.x` separated and self-contained), we have to 
move them (without changes) to `hudi-spark3-common`, copy them (with changed 
imports) to `hudi-spark4.0.x`, and add ~30 methods to SparkAdapter to work with 
these classes depending on Spark version.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to