waynexia opened a new issue, #5169:
URL: https://github.com/apache/arrow-datafusion/issues/5169
**Is your feature request related to a problem or challenge? Please describe
what you are trying to do.**
Extend substrait's literal type to unsigned variants
**Describe the solution you'd like**
The literal types in the following two systems are not aligned:
- Substrait:
https://docs.rs/substrait/latest/substrait/protobuf/expression/literal/enum.LiteralType.html
- Arrow: https://docs.rs/arrow/latest/arrow/datatypes/enum.DataType.html
One gap is unsigned primitives are not in Substrait's definition. In
https://github.com/substrait-io/substrait/discussions/2#discussioncomment-1297609
they were cataloged as "third party extension defined types". But it's widely
used in Arrow and DataFusion. I'd like to discuss how they would be integrated
here.
For types we can use type extension, as [I
did](https://github.com/GreptimeTeam/greptimedb/blob/8959dbcef83507ccd76aaaffd2b44cab6426e68f/src/common/substrait/src/types.rs)
in greptimedb , I occupy the "1" variations for those types (I8, I16, I32,
I64) and translate them to the unsigned version. I think we can do it this way
here.
Kind::I8(desc) => substrait_kind!(desc, int8_datatype,
uint8_datatype),
Kind::I16(desc) => substrait_kind!(desc, int16_datatype,
uint16_datatype),
Kind::I32(desc) => substrait_kind!(desc, int32_datatype,
uint32_datatype),
Kind::I64(desc) => substrait_kind!(desc, int64_datatype,
uint64_datatype),
For literals we can do it similarly, but one difference is they also ship
the data values. I prepare to
[transmute](https://doc.rust-lang.org/std/mem/fn.transmute.html) between the
signed and unsigned values. But not sure if this is the best way we can achieve
it. As I investigated, duckdb also doesn't cover [this
part](https://github.com/duckdblabs/substrait/blob/a08fe49de78926a3112daddcb64d45a12e43950a/substrait/from_substrait.cpp#L94-L102)
```
case substrait::Expression_Literal::LiteralTypeCase::kI8:
dval = Value::TINYINT(slit.i8());
break;
case substrait::Expression_Literal::LiteralTypeCase::kI32:
dval = Value::INTEGER(slit.i32());
break;
case substrait::Expression_Literal::LiteralTypeCase::kI64:
dval = Value::BIGINT(slit.i64());
break;
```
**Describe alternatives you've considered**
N/A
**Additional context**
This issue is originally posted in
https://github.com/datafusion-contrib/datafusion-substrait/issues/41
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]