greedAuguria opened a new issue, #19650:
URL: https://github.com/apache/datafusion/issues/19650

   ### Describe the bug
   
   When using Hive-style partitioned tables where partition values contain 
URL-encoded characters (like `/` encoded as `%2F` or spaces as `%20`), 
DataFusion returns the literal encoded string instead of the decoded value.
   
   For example, given a file at:
   `s3://bucket/table/category=foo%2Fbar/file.parquet`
   
   The partition column `category` returns the literal value `foo%2Fbar` 
instead of the expected decoded value `foo/bar`.
   
   ### Related Issues
   This is a follow-up to #7877, which was partially addressed by #8012. 
   While #8012 fixed URL decoding for the **Table URL** 
(`ListingTableUrl::parse()`), it did not apply decoding to the **extracted 
partition values** from the actual file paths within 
`parse_partitions_for_path()`.
   
   
   ### To Reproduce
   
   ```rust
   use datafusion::datasource::listing::helpers::parse_partitions_for_path;
   use datafusion::datasource::listing::ListingTableUrl;
   use object_store::path::Path;
   
   #[test]
   fn test_reproduce_partition_decoding_issue() {
       let table_url = ListingTableUrl::parse("s3://bucket/table").unwrap();
       // Path contains URL encoded slash %2F
       let file_path = 
Path::from("bucket/table/category=foo%2Fbar/file.parquet");
   
       let partitions = parse_partitions_for_path(&table_url, &file_path, 
vec!["category"]);
   
       // Current behavior: Some(["foo%2Fbar"])
       // Expected behavior: Some(["foo/bar"])
       assert_eq!(partitions, Some(vec!["foo/bar".to_string()]));
   }
   ```
   
   
   ### Expected behavior
   
   Partition values should be URL-decoded, consistent with how 
`ListingTableUrl` handles URL-encoded paths. This matches the behavior of 
Apache Spark and Apache Hive.
   
   
   ### Additional context
   
   The fix involves updating `parse_partitions_for_path` in 
`datafusion/catalog-listing/src/helpers.rs` to use `percent-encoding`. 
   
   Because decoding creates a new string, the function signature needs to 
change from `Option<Vec<&str>>` to `Option<Vec<String>>`. 
   
   This affects users storing data in Hive-partitioned layouts on object stores 
(S3/GCS/Azure) where special characters in paths are standard. 
   
   Common examples:
   - `category=Electronics%2FComputers` → `Electronics/Computers`
   - `city=San%20Francisco` → `San Francisco`
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to