[GitHub] [hudi] yihua commented on issue #5211: [SUPPORT] Glob pattern to pick specific subfolders not working while reading in Spark

GitBox Wed, 29 Jun 2022 01:40:00 -0700


yihua commented on issue #5211:
URL: https://github.com/apache/hudi/issues/5211#issuecomment-1169697222


   > Why would support for wildcards be dropped from 0.90 to 0.10? We partition 
data by different providers and need the ability to quickly fetch to subsets of 
the data based on query patterns.
   
   The prior wildcard pattern is not meant to be used for a subset of 
partitions in a Hudi table.  Instead of globbing selective paths, you can still 
load the Hudi table with the base path and use `.filter` with the partition 
values so Spark does the partition pruning without scanning the whole table:
   ```
   
spark.read.format("org.apache.hudi").load("s3://<hudi_table_base_path>").filter(col("cluster")
 === "abc")
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] yihua commented on issue #5211: [SUPPORT] Glob pattern to pick specific subfolders not working while reading in Spark

Reply via email to