Jefffrey commented on code in PR #22905:
URL: https://github.com/apache/datafusion/pull/22905#discussion_r3425458910
##########
datafusion/functions/src/string/common.rs:
##########
@@ -402,11 +405,45 @@ fn case_conversion(
let result = a.as_ref().map(|x| unicode_case(x, lower));
Ok(ColumnarValue::Scalar(ScalarValue::Utf8View(result)))
}
+ ScalarValue::Dictionary(key_type, value) => {
+ let converted = case_conversion(
+ &[ColumnarValue::Scalar((**value).clone())],
+ lower,
+ name,
+ )?;
+ match converted {
+ ColumnarValue::Scalar(value) => Ok(ColumnarValue::Scalar(
+ ScalarValue::Dictionary(key_type.clone(),
Box::new(value)),
+ )),
+ ColumnarValue::Array(_) => {
+ unreachable!("scalar case conversion returned an
array")
+ }
Review Comment:
We could consider pulling the scalar code into a separate function like so:
```rust
fn case_conversion_scalar(
scalar: &ScalarValue,
lower: bool,
name: &str,
) -> Result<ScalarValue> {
match scalar {
ScalarValue::Utf8(a) => {
let result = a.as_ref().map(|x| unicode_case(x, lower));
Ok(ScalarValue::Utf8(result))
}
ScalarValue::LargeUtf8(a) => {
let result = a.as_ref().map(|x| unicode_case(x, lower));
Ok(ScalarValue::LargeUtf8(result))
}
ScalarValue::Utf8View(a) => {
let result = a.as_ref().map(|x| unicode_case(x, lower));
Ok(ScalarValue::Utf8View(result))
}
ScalarValue::Dictionary(key_type, value) => {
let converted = case_conversion_scalar(value.as_ref(), lower,
name)?;
Ok(ScalarValue::Dictionary(
key_type.clone(),
Box::new(converted),
))
}
other => exec_err!("Unsupported data type {other:?} for function
{name}"),
}
}
```
Then the arm becomes something like
```rust
ColumnarValue::Scalar(scalar) => {
let converted = case_conversion_scalar(scalar, lower, name)?;
Ok(ColumnarValue::Scalar(converted))
}
```
Which lets us avoid that awkward little logic to extract the scalar from a
columnarvalue we know will always be scalar.
(We can also do the same for array path)
##########
datafusion/functions/src/string/lower.rs:
##########
@@ -118,6 +123,52 @@ mod tests {
Ok(())
}
+ fn invoke_lower_scalar(input: ScalarValue) -> Result<ScalarValue> {
Review Comment:
could we have these tests in SLTs? Same for upper
##########
datafusion/functions/src/string/lower.rs:
##########
@@ -57,9 +57,12 @@ impl LowerFunc {
pub fn new() -> Self {
Self {
signature: Signature::coercible(
- vec![Coercion::new_exact(TypeSignatureClass::Native(
- logical_string(),
- ))],
+ vec![
+
Coercion::new_exact(TypeSignatureClass::Native(logical_string()))
+ .with_encoding_preservation(
+ EncodingPreservation::default().with_dictionary(),
Review Comment:
This is a bit awkward, I wonder if we can have a constructor on
`EncodingPreservation` to set this?
I was thinking like `EncodingPreservation::preserve_dictionary` but that is
already the name of the getter 🤔
##########
datafusion/expr/src/type_coercion/functions.rs:
##########
@@ -1831,6 +1869,57 @@ mod tests {
Ok(())
}
+ #[test]
+ fn test_coercible_dictionary_preserves_encoding() -> Result<()> {
Review Comment:
Could we also have a test for a `TypeSignatureClass` that isn't native?
curious to see how it'll work, since #19458 highlighted that current behaviour
for dictionaries is already different for `TypeSignatureClass::Native` and
non-native (e.g. `TypeSignatureClass::Integer`)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]