danny0405 commented on code in PR #13127:
URL: https://github.com/apache/hudi/pull/13127#discussion_r2105670437


##########
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/HoodieFileGroupReaderBasedParquetFileFormat.scala:
##########
@@ -81,17 +81,31 @@ class 
HoodieFileGroupReaderBasedParquetFileFormat(tablePath: String,
   private val sanitizedTableName = 
AvroSchemaUtils.getAvroRecordQualifiedName(tableName)
 
   /**
-   * Support batch needs to remain consistent, even if one side of a bootstrap 
merge can support
-   * while the other side can't
+   * Support vectorized reading
    */
   private var supportBatchCalled = false
+
+  /**
+   * Support return result as batch
+   */
   private var supportBatchResult = false
 
+  /**
+   * Check if the file format supports vectorized reading, please refer to 
SPARK-40918
+   *
+   * NOTE: for mor read, even for file-slice with only base file, we can read 
parquet file with vectorized, but the return result of the whole 
data-source-scan phase cannot be batch,
+   * because when there are any log file in a file slice, it needs to be read 
by the file group reader.
+   * Since we are currently performing merges based on rows, the result 
returned at this time is also based on rows,
+   * we cannot assume that all file slices have only base files.
+   * So we need to set the batch result back to false
+   *
+   */
   override def supportBatch(sparkSession: SparkSession, schema: StructType): 
Boolean = {
-    if (!supportBatchCalled || supportBatchResult) {
-      supportBatchCalled = true
-      supportBatchResult = !isMOR && !isIncremental && !isBootstrap && 
super.supportBatch(sparkSession, schema)
-    }
+    val superSupportBatch = super.supportBatch(sparkSession, schema)
+    supportBatchCalled = !isIncremental && !isBootstrap && superSupportBatch

Review Comment:
   It's okay we change the variable name, my concern is does the change 
introduce some side-effect to other read paths, like causing some regression or 
overhead. Can you help to confirm that?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to