greedAuguria commented on PR #19651: URL: https://github.com/apache/datafusion/pull/19651#issuecomment-4186765500
@martin-g hey, sorry again for the late reply! I've rebased on main and addressed both of your comments: **Cow<'a, str> instead of String** — you were right, `percent_decode_str().decode_utf8()` already hands back a `Cow<str>`, so it's a natural fit. Now we only allocate when the value actually had encoded characters in it. Most partition values won't, so this should be a nice win for the common case. Updated the caller to use `into_owned()` instead of `to_string()` too so we don't double-allocate on the owned path. **Fallback for invalid encodings** — good catch on the `%FF` case. Previously `.ok()?` would bail out and return `None` for the whole file, which silently drops it from query results — definitely not what we want. Now if percent-decoding produces invalid UTF-8, we just pass through the raw value as-is (via `unwrap_or(Cow::Borrowed(val))`). So `%FF` comes back as the literal string `%FF` instead of the file disappearing. Same idea as your suggestion to treat it like `%XX`. The taplo CI failure was just an alphabetical ordering thing in the root `Cargo.toml` — fixed during the rebase. Let me know if there's anything else! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
