jonvex commented on code in PR #11926:
URL: https://github.com/apache/hudi/pull/11926#discussion_r1768776615


##########
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieHadoopFsRelationFactory.scala:
##########
@@ -94,7 +94,15 @@ abstract class HoodieBaseHadoopFsRelationFactory(val 
sqlContext: SQLContext,
     tableConfig.getPartitionFields.orElse(Array.empty).toSeq
   } else {
     //it's custom keygen
-    
CustomAvroKeyGenerator.getTimestampFields(HoodieTableConfig.getPartitionFieldsForKeyGenerator(tableConfig).orElse(java.util.Collections.emptyList[String]())).asScala.toSeq
+    val timestampFieldsOpt = CustomAvroKeyGenerator.getTimestampFields(
+      
HoodieTableConfig.getPartitionFieldsForKeyGenerator(tableConfig).orElse(java.util.Collections.emptyList[String]()))

Review Comment:
   ok. So they use custom keygen. There are 3 cases:
   
   1. they have some timestamp cols:
   ```
   partition cols are "part1:timestamp,part2:string,part3:timestamp"
    ```
    In this scenario we want 
    ```
    partitionColumnsToRead = [part1, part3]
    ```
    
    2. they have no timestamp cols:
    ```
    partition cols are "part1:string,part2:string,part3:string"
   ```
   in this scenario we want
   ```
    partitionColumnsToRead = []
   ```
   3. we don't know because it is a 0.x table
   ```
   partition cols are "part1,part2,part3"
   ```
   we must have 
   ```
    partitionColumnsToRead = [part1, part2, part3]
   ```
   because we don't know which ones are timestamp
   
   
   To simplify that to 1 sentence: "partitionColumnsToRead needs to be the list 
of partition columns that could be timestamp cols"
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to