Re: [PR] [HUDI-7709] ClassCastException while reading the data using `TimestampBasedKeyGenerator` [hudi]

via GitHub Wed, 13 Nov 2024 01:08:31 -0800


geserdugarov commented on code in PR #11501:
URL: https://github.com/apache/hudi/pull/11501#discussion_r1839787010



##########
hudi-common/src/main/java/org/apache/hudi/BaseHoodieTableFileIndex.java:
##########
@@ -360,13 +363,22 @@ protected HoodieTimeline getActiveTimeline() {
   }
 
   private Object[] parsePartitionColumnValues(String[] partitionColumns, 
String partitionPath) {
-    Object[] partitionColumnValues = 
doParsePartitionColumnValues(partitionColumns, partitionPath);
-    if (shouldListLazily && partitionColumnValues.length != 
partitionColumns.length) {
-      throw new HoodieException("Failed to parse partition column values from 
the partition-path:"
-          + " likely non-encoded slashes being used in partition column's 
values. You can try to"
-          + " work this around by switching listing mode to eager");
+    HoodieTableConfig tableConfig = metaClient.getTableConfig();
+    Object[] partitionColumnValues;
+    if (null != tableConfig.getKeyGeneratorClassName()
+        && 
tableConfig.getKeyGeneratorClassName().equals(KeyGeneratorType.TIMESTAMP.getClassName())
+        && 
tableConfig.propsMap().get(TimestampKeyGeneratorConfig.TIMESTAMP_TYPE_FIELD.key()).matches("SCALAR|UNIX_TIMESTAMP|EPOCHMILLISECONDS"))
 {
+      // For TIMESTAMP key generator when TYPE is SCALAR, UNIX_TIMESTAMP or 
EPOCHMILLISECONDS,
+      // we couldn't reconstruct initial partition column values from 
partition paths due to lost data after formatting in most cases
+      partitionColumnValues = new Object[partitionColumns.length];

Review Comment:
   @danny0405 , as I remember, the problem that there was no way to force not 
to do partition pruning when it is not suitable.
   Rechecked it on current master, 41816e30041b82da7505c8c18288cf4f5df4e00a, 
using PySpark:
   
https://github.com/geserdugarov/test-hudi-issues/blob/cad404d3a2ec1c48d7f5f370bcc5a8c04060b5bf/HUDI-7952/forced-partition-pruning-workaround.py#L43-L50
   
   As you can see in this test
   ```SQL
   SELECT id, name, precomb, ts FROM ts_partition_pruning;
   ```
   returns
   ```Text
   Row(id=2, name='a3', precomb=1, ts='1718952603')
   Row(id=1, name='a1', precomb=1, ts='1078016523')
   ```
   
   But query result of the same data with the biggest range filter (from 1 to 
Long.MAX_VALUE), which equals to select all data:
   ```SQL
   SELECT * FROM ts_partition_pruning WHERE ts BETWEEN 1 and 
9223372036854775807;
   ```
   returns empty result.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] [HUDI-7709] ClassCastException while reading the data using `TimestampBasedKeyGenerator` [hudi]

Reply via email to