martin-g commented on code in PR #19651:
URL: https://github.com/apache/datafusion/pull/19651#discussion_r2664717641


##########
datafusion/catalog-listing/src/helpers.rs:
##########
@@ -417,12 +418,15 @@ pub async fn pruned_partition_list<'a>(
 }
 
 /// Extract the partition values for the given `file_path` (in the given 
`table_path`)
-/// associated to the partitions defined by `table_partition_cols`
+/// associated to the partitions defined by `table_partition_cols`.
+///
+/// Partition values are URL-decoded, since object stores like S3 encode 
special
+/// characters (e.g., `/` becomes `%2F`) in path segments.
 pub fn parse_partitions_for_path<'a, I>(
     table_path: &ListingTableUrl,
     file_path: &'a Path,
     table_partition_cols: I,
-) -> Option<Vec<&'a str>>
+) -> Option<Vec<String>>

Review Comment:
   nit: Not sure whether it is worth it to return a `Vec<Cow<'a, str>>` here ?! 
E.g. if the partition contains `%` then decode it and return 
`Cow::Owned(decoded.into_owned())`, otherwise `Cow::Borrowed(val).



##########
datafusion/catalog-listing/src/helpers.rs:
##########
@@ -417,12 +418,15 @@ pub async fn pruned_partition_list<'a>(
 }
 
 /// Extract the partition values for the given `file_path` (in the given 
`table_path`)
-/// associated to the partitions defined by `table_partition_cols`
+/// associated to the partitions defined by `table_partition_cols`.
+///
+/// Partition values are URL-decoded, since object stores like S3 encode 
special
+/// characters (e.g., `/` becomes `%2F`) in path segments.
 pub fn parse_partitions_for_path<'a, I>(
     table_path: &ListingTableUrl,
     file_path: &'a Path,
     table_partition_cols: I,
-) -> Option<Vec<&'a str>>
+) -> Option<Vec<String>>

Review Comment:
   Actually this is what 
https://docs.rs/percent-encoding/latest/percent_encoding/struct.PercentDecode.html#method.decode_utf8
 returns. It might be a good idea.



##########
datafusion/catalog-listing/src/helpers.rs:
##########
@@ -431,7 +435,10 @@ where
     let mut part_values = vec![];
     for (part, expected_partition) in subpath.zip(table_partition_cols) {
         match part.split_once('=') {
-            Some((name, val)) if name == expected_partition => 
part_values.push(val),
+            Some((name, val)) if name == expected_partition => {
+                let decoded = percent_decode_str(val).decode_utf8().ok()?;

Review Comment:
   Should invalid values (like `%FF` in one of the tests below) return None or 
`val` ?!
   I think it should behave like `%XX` - i.e. return `val`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to