nsivabalan commented on code in PR #5364:
URL: https://github.com/apache/hudi/pull/5364#discussion_r853598977
##########
hudi-spark-datasource/hudi-spark3/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/Spark32HoodieParquetFileFormat.scala:
##########
@@ -289,6 +324,16 @@ class Spark32HoodieParquetFileFormat extends
ParquetFileFormat {
object Spark32HoodieParquetFileFormat {
+ def pruneInternalSchema(internalSchemaStr: String, requiredSchema:
StructType): String = {
Review Comment:
feel free to fix this in a follow up PR if need be. may be we can move this
to a util class and used in across adaptors? I see same exact method in
Spark312HoodieParquetFileFormat class as well.
##########
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/BaseFileOnlyRelation.scala:
##########
@@ -114,16 +114,38 @@ class BaseFileOnlyRelation(sqlContext: SQLContext,
* rule; you can find more details in HUDI-3896)
*/
def toHadoopFsRelation: HadoopFsRelation = {
+ // We're delegating to Spark to append partition values to every row only
in cases
+ // when these corresponding partition-values are not persisted w/in the
data file itself
+ val shouldAppendPartitionColumns = omitPartitionColumnsInFile
Review Comment:
minor. instead of "omitPartitionColumnsInFile" (present tense), may be we
can name the variable as "isPartitionColumnPersistedInDataFile" (past tense).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]