[GitHub] [hudi] yihua commented on a diff in pull request #8885: [HUDI-6198] Support Hudi on Spark 3.4.0

via GitHub Thu, 08 Jun 2023 17:11:13 -0700


yihua commented on code in PR #8885:
URL: https://github.com/apache/hudi/pull/8885#discussion_r1223651454



##########
hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/spark/sql/hudi/command/index/TestIndexSyntax.scala:
##########
@@ -56,30 +58,37 @@ class TestIndexSyntax extends HoodieSparkSqlTestBase {
 
         var logicalPlan = sqlParser.parsePlan(s"show indexes from 
default.$tableName")
         var resolvedLogicalPlan = analyzer.execute(logicalPlan)
-        
assertResult(s"`default`.`$tableName`")(resolvedLogicalPlan.asInstanceOf[ShowIndexesCommand].table.identifier.quotedString)

Review Comment:
   FR: `table.identifier.quotedString` now also has catalog name as the prefix.



##########
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieBaseRelation.scala:
##########
@@ -733,8 +734,8 @@ object HoodieBaseRelation extends SparkAdapterSupport {
 
     partitionedFile => {
       val hadoopConf = hadoopConfBroadcast.value.get()
-      val reader = new HoodieAvroHFileReader(hadoopConf, new 
Path(partitionedFile.filePath),
-        new CacheConfig(hadoopConf))
+      val filePath = 
sparkAdapter.getSparkPartitionedFileUtils.getPathFromPartitionedFile(partitionedFile)

Review Comment:
   For Reviewer (FR): all the changes in the common module of introducing new 
adapter support are because of Spark 3.4 class and API changes.



##########
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/HoodieParquetFileFormat.scala:
##########
@@ -34,6 +34,15 @@ class HoodieParquetFileFormat extends ParquetFileFormat with 
SparkAdapterSupport
 
   override def toString: String = "Hoodie-Parquet"
 
+  override def supportBatch(sparkSession: SparkSession, schema: StructType): 
Boolean = {

Review Comment:
   FR: Spark 3.4 now supports vectorized reader on nested fields.  However, 
Hudi does not support this yet due to custom schema evolution logic.  So we add 
logic to override `supportBatch` in `HoodieParquetFileFormat` for Spark 3.4.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] yihua commented on a diff in pull request #8885: [HUDI-6198] Support Hudi on Spark 3.4.0

Reply via email to