gruuya commented on issue #813:
URL: https://github.com/apache/iceberg-rust/issues/813#issuecomment-2556514864
> For the filtering situation, we want to cast the type to the physical type
Is something like this close to what you had in mind
```diff
/// Convert Iceberg Datum to Arrow Datum.
-pub(crate) fn get_arrow_datum(datum: &Datum) -> Result<Box<dyn ArrowDatum +
Send>> {
+pub(crate) fn get_arrow_datum(
+ datum: &Datum,
+ arrow_type: &DataType,
+) -> Result<Box<dyn ArrowDatum + Send>> {
match (datum.data_type(), datum.literal()) {
(PrimitiveType::Boolean, PrimitiveLiteral::Boolean(value)) => {
Ok(Box::new(BooleanArray::new_scalar(*value)))
}
- (PrimitiveType::Int, PrimitiveLiteral::Int(value)) => {
- Ok(Box::new(Int32Array::new_scalar(*value)))
- }
+ (PrimitiveType::Int, PrimitiveLiteral::Int(value)) => match
arrow_type {
+ DataType::Int8 => Ok(Box::new(Int8Array::new_scalar(*value as
i8))),
+ DataType::Int16 => Ok(Box::new(Int16Array::new_scalar(*value as
i16))),
+ DataType::Int32 => Ok(Box::new(Int32Array::new_scalar(*value))),
+ _ => Err(Error::new(
+ ErrorKind::DataInvalid,
+ format!("Can't convert {datum} to type {arrow_type}"),
+ )),
+ },
(PrimitiveType::Long, PrimitiveLiteral::Long(value)) => {
Ok(Box::new(Int64Array::new_scalar(*value)))
```
and then for `PredicateConverter` call it only once the column is projected
(and thus the target data type is known)
```diff
if let Some(idx) = self.bound_reference(reference)? {
- let literal = get_arrow_datum(literal)?;
-
Ok(Box::new(move |batch| {
let left = project_column(&batch, idx)?;
+ let literal = get_arrow_datum(literal, left.data_type())?;
lt_eq(&left, literal.as_ref())
}))
```
I found that this also resolves the reported problem. Though arguably less
general than just casting the batches at the arrow/parquet level, it is a less
invasive fix.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]