nsivabalan commented on code in PR #5364:
URL: https://github.com/apache/hudi/pull/5364#discussion_r853598977


##########
hudi-spark-datasource/hudi-spark3/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/Spark32HoodieParquetFileFormat.scala:
##########
@@ -289,6 +324,16 @@ class Spark32HoodieParquetFileFormat extends 
ParquetFileFormat {
 
 object Spark32HoodieParquetFileFormat {
 
+  def pruneInternalSchema(internalSchemaStr: String, requiredSchema: 
StructType): String = {

Review Comment:
   feel free to fix this in a follow up PR if need be. may be we can move this 
to a util class and used in across adaptors? I see same exact method in 
Spark312HoodieParquetFileFormat class as well.



##########
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/BaseFileOnlyRelation.scala:
##########
@@ -114,16 +114,38 @@ class BaseFileOnlyRelation(sqlContext: SQLContext,
    *       rule; you can find more details in HUDI-3896)
    */
   def toHadoopFsRelation: HadoopFsRelation = {
+    // We're delegating to Spark to append partition values to every row only 
in cases
+    // when these corresponding partition-values are not persisted w/in the 
data file itself
+    val shouldAppendPartitionColumns = omitPartitionColumnsInFile

Review Comment:
   minor. instead of "omitPartitionColumnsInFile" (present tense), may be we 
can name the variable as "isPartitionColumnPersistedInDataFile" (past tense).  



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to