Re: [PR] [HUDI-7190] Fix nested columns vectorized read for spark33+ legacy formats [hudi]

via GitHub Wed, 06 Dec 2023 23:51:58 -0800


stream2000 commented on code in PR #10265:
URL: https://github.com/apache/hudi/pull/10265#discussion_r1418516658



##########
hudi-spark-datasource/hudi-spark3.3.x/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/Spark33LegacyHoodieParquetFileFormat.scala:
##########
@@ -120,9 +120,7 @@ class Spark33LegacyHoodieParquetFileFormat(private val 
shouldAppendPartitionValu
     val resultSchema = StructType(partitionSchema.fields ++ 
requiredSchema.fields)
     val sqlConf = sparkSession.sessionState.conf
     val enableOffHeapColumnVector = sqlConf.offHeapColumnVectorEnabled
-    val enableVectorizedReader: Boolean =

Review Comment:
   For reviewers: In Spark3.3+, we will use the following code to check if we 
can do vecrized read: 
   
   ```scala
     override def supportBatch(sparkSession: SparkSession, schema: StructType): 
Boolean = {
       val conf = sparkSession.sessionState.conf
       ParquetUtils.isBatchReadSupportedForSchema(conf, schema) && 
conf.wholeStageEnabled &&
         !WholeStageCodegenExec.isTooManyFields(conf, schema)
     }
   ```
   
   So nested type can support vectorized read since Spark 3.3.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] [HUDI-7190] Fix nested columns vectorized read for spark33+ legacy formats [hudi]

Reply via email to