jonvex commented on code in PR #11770:
URL: https://github.com/apache/hudi/pull/11770#discussion_r1744205574
##########
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/HoodieFileGroupReaderBasedParquetFileFormat.scala:
##########
@@ -231,4 +270,32 @@ class
HoodieFileGroupReaderBasedParquetFileFormat(tableState: HoodieTableState,
override def close(): Unit = closeableFileGroupRecordIterator.close()
}
}
+
+ private def readBaseFile(file: PartitionedFile, parquetFileReader:
SparkParquetReader, requestedSchema: StructType,
+ remainingPartitionSchema: StructType,
fixedPartitionIndexes: Set[Int], requiredSchema: StructType,
+ partitionSchema: StructType, outputSchema:
StructType, filters: Seq[Filter],
+ storageConf: StorageConfiguration[Configuration]):
Iterator[InternalRow] = {
+ if (remainingPartitionSchema.fields.length ==
partitionSchema.fields.length) {
Review Comment:
I tested with TestSparkSqlWithCustomKeyGenerator by changing
partitionColumnsToRead in HoodieHadoopFsRelationFactory by doing:
```
//TODO: [HUDI-8098] filter for timestamp keygen columns when using custom
keygen
tableConfig.getPartitionFields.orElse(Array.empty).filter(p => p ==
"ts").toSeq
```
to fake what [HUDI-8036] + [HUDI-8098] will do. This exposed a case that I
didn't test. For MOR with log files where we read some, but not all of the
partition columns, I was not doing appending correctly. I have updated
HoodieFileGroupReaderBasedParquetFileFormat.appendPartitionAndProject to do
this correctly now.
I feel like appendPartitionAndProject and readBaseFile have overlapping
logic, but can't think of a better way to do this for now.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]