alexanderbianchi commented on PR #22041:
URL: https://github.com/apache/datafusion/pull/22041#issuecomment-4397658813
A bit more context on why this takes the approach of serializing the inner
value:
There are two unrelated "dictionary" concepts that are easy to conflate here:
```text
Map/dictionary value:
{"service": "beagle", "dc": "us1"}
logical object/map type
Arrow dictionary encoding:
Dictionary(Int32, Utf8)
physically encoded string column: integer keys + string values
```
This PR is about the second one. The logical value is still just a string.
The failing path we hit was:
```sql
metric_name = 'req.latency'
```
where the table schema exposes `metric_name` as:
```text
Dictionary(Int32, Utf8)
```
DataFusion type coercion makes both sides of the equality compatible, so the
predicate becomes conceptually:
```text
Column(metric_name: Dictionary(Int32, Utf8))
=
Literal(Dictionary(Int32, Utf8("req.latency")))
```
That scalar is not a map/object value. It is DataFusion representing a
string scalar that has been coerced to match a dictionary-encoded string column.
The Substrait producer then failed with:
```text
Unsupported literal: Dictionary(Int32, Utf8("req.latency"))
```
In Substrait, there is no useful distinction between a string scalar and a
"dictionary-encoded string scalar" here. Dictionary encoding is meaningful for
arrays/columns, not for a single scalar literal. So the intended encoding is
just the logical literal value:
```text
Substrait string literal "req.latency"
```
The column/scan can still be dictionary encoded when the plan is consumed
against a table schema where `metric_name` is `Dictionary(Int32, Utf8)`. At
that point DataFusion can again apply its normal coercion/execution behavior
for comparing the dictionary column to the string literal.
So the key point is: this PR is not trying to encode dictionary array layout
into Substrait literals. It is preserving the logical scalar value while
avoiding a producer failure caused by DataFusion's internal
`ScalarValue::Dictionary` representation after type coercion.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]