jonvex commented on code in PR #11926:
URL: https://github.com/apache/hudi/pull/11926#discussion_r1768776615
##########
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieHadoopFsRelationFactory.scala:
##########
@@ -94,7 +94,15 @@ abstract class HoodieBaseHadoopFsRelationFactory(val
sqlContext: SQLContext,
tableConfig.getPartitionFields.orElse(Array.empty).toSeq
} else {
//it's custom keygen
-
CustomAvroKeyGenerator.getTimestampFields(HoodieTableConfig.getPartitionFieldsForKeyGenerator(tableConfig).orElse(java.util.Collections.emptyList[String]())).asScala.toSeq
+ val timestampFieldsOpt = CustomAvroKeyGenerator.getTimestampFields(
+
HoodieTableConfig.getPartitionFieldsForKeyGenerator(tableConfig).orElse(java.util.Collections.emptyList[String]()))
Review Comment:
ok. So they use custom keygen. There are 3 cases:
1. they have some timestamp cols:
```
partition cols are "part1:timestamp,part2:string,part3:timestamp"
```
In this scenario we want
```
partitionColumnsToRead = [part1, part3]
```
2. they have no timestamp cols:
```
partition cols are "part1:string,part2:string,part3:string"
```
in this scenario we want
```
partitionColumnsToRead = []
```
3. we don't know because it is a 0.x table
```
partition cols are "part1,part2,part3"
```
we must have
```
partitionColumnsToRead = [part1, part2, part3]
```
because we don't know which ones are timestamp
To simplify that to 1 sentence: "partitionColumnsToRead needs to be the list
of partition columns that could be timestamp cols"
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]