greedAuguria commented on PR #19651:
URL: https://github.com/apache/datafusion/pull/19651#issuecomment-4186765500

   @martin-g hey, sorry again for the late reply! I've rebased on main and 
addressed both of your comments:
   
   **Cow<'a, str> instead of String** — you were right, 
`percent_decode_str().decode_utf8()` already hands back a `Cow<str>`, so it's a 
natural fit. Now we only allocate when the value actually had encoded 
characters in it. Most partition values won't, so this should be a nice win for 
the common case. Updated the caller to use `into_owned()` instead of 
`to_string()` too so we don't double-allocate on the owned path.
   
   **Fallback for invalid encodings** — good catch on the `%FF` case. 
Previously `.ok()?` would bail out and return `None` for the whole file, which 
silently drops it from query results — definitely not what we want. Now if 
percent-decoding produces invalid UTF-8, we just pass through the raw value 
as-is (via `unwrap_or(Cow::Borrowed(val))`). So `%FF` comes back as the literal 
string `%FF` instead of the file disappearing. Same idea as your suggestion to 
treat it like `%XX`.
   
   The taplo CI failure was just an alphabetical ordering thing in the root 
`Cargo.toml` — fixed during the rebase.
   
   Let me know if there's anything else!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to