waynexia opened a new issue, #5169:
URL: https://github.com/apache/arrow-datafusion/issues/5169

   **Is your feature request related to a problem or challenge? Please describe 
what you are trying to do.**
   Extend substrait's literal type to unsigned variants
   
   **Describe the solution you'd like**
   
   The literal types in the following two systems are not aligned:
   
    -  Substrait: 
https://docs.rs/substrait/latest/substrait/protobuf/expression/literal/enum.LiteralType.html
    -  Arrow: https://docs.rs/arrow/latest/arrow/datatypes/enum.DataType.html
   
   One gap is unsigned primitives are not in Substrait's definition. In 
https://github.com/substrait-io/substrait/discussions/2#discussioncomment-1297609
 they were cataloged as "third party extension defined types". But it's widely 
used in Arrow and DataFusion. I'd like to discuss how they would be integrated 
here.
   
   For types we can use type extension, as [I 
did](https://github.com/GreptimeTeam/greptimedb/blob/8959dbcef83507ccd76aaaffd2b44cab6426e68f/src/common/substrait/src/types.rs)
 in greptimedb , I occupy the "1" variations for those types (I8, I16, I32, 
I64) and translate them to the unsigned version. I think we can do it this way 
here.
   
           Kind::I8(desc) => substrait_kind!(desc, int8_datatype, 
uint8_datatype),
           Kind::I16(desc) => substrait_kind!(desc, int16_datatype, 
uint16_datatype),
           Kind::I32(desc) => substrait_kind!(desc, int32_datatype, 
uint32_datatype),
           Kind::I64(desc) => substrait_kind!(desc, int64_datatype, 
uint64_datatype),
   
   For literals we can do it similarly, but one difference is they also ship 
the data values. I prepare to 
[transmute](https://doc.rust-lang.org/std/mem/fn.transmute.html) between the 
signed and unsigned values. But not sure if this is the best way we can achieve 
it. As I investigated, duckdb also doesn't cover [this 
part](https://github.com/duckdblabs/substrait/blob/a08fe49de78926a3112daddcb64d45a12e43950a/substrait/from_substrait.cpp#L94-L102)
   
   ```
     case substrait::Expression_Literal::LiteralTypeCase::kI8:
       dval = Value::TINYINT(slit.i8());
       break;
     case substrait::Expression_Literal::LiteralTypeCase::kI32:
       dval = Value::INTEGER(slit.i32());
       break;
     case substrait::Expression_Literal::LiteralTypeCase::kI64:
       dval = Value::BIGINT(slit.i64());
       break;
   ```
   **Describe alternatives you've considered**
   N/A
   
   **Additional context**
   This issue is originally posted in 
https://github.com/datafusion-contrib/datafusion-substrait/issues/41
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to