[GitHub] [hudi] nsivabalan commented on a diff in pull request #5364: [HUDI-3204] Fixing partition-values being derived from partition-path instead of source columns

GitBox Tue, 19 Apr 2022 16:38:21 -0700


nsivabalan commented on code in PR #5364:
URL: https://github.com/apache/hudi/pull/5364#discussion_r853598977



##########
hudi-spark-datasource/hudi-spark3/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/Spark32HoodieParquetFileFormat.scala:
##########
@@ -289,6 +324,16 @@ class Spark32HoodieParquetFileFormat extends 
ParquetFileFormat {
 
 object Spark32HoodieParquetFileFormat {
 
+  def pruneInternalSchema(internalSchemaStr: String, requiredSchema: 
StructType): String = {

Review Comment:
   feel free to fix this in a follow up PR if need be. may be we can move this 
to a util class and used in across adaptors? I see same exact method in 
Spark312HoodieParquetFileFormat class as well.



##########
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/BaseFileOnlyRelation.scala:
##########
@@ -114,16 +114,38 @@ class BaseFileOnlyRelation(sqlContext: SQLContext,
    *       rule; you can find more details in HUDI-3896)
    */
   def toHadoopFsRelation: HadoopFsRelation = {
+    // We're delegating to Spark to append partition values to every row only 
in cases
+    // when these corresponding partition-values are not persisted w/in the 
data file itself
+    val shouldAppendPartitionColumns = omitPartitionColumnsInFile

Review Comment:
   minor. instead of "omitPartitionColumnsInFile" (present tense), may be we 
can name the variable as "isPartitionColumnPersistedInDataFile" (past tense).  



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] nsivabalan commented on a diff in pull request #5364: [HUDI-3204] Fixing partition-values being derived from partition-path instead of source columns

Reply via email to