zhangfengcdt commented on PR #254:
URL: https://github.com/apache/sedona-db/pull/254#issuecomment-3470434020

   > Thank you for looking into this!
   > 
   > I would prefer to find the place in Comet that is dropping the type and 
fix it upstream (perhaps in one of its calls to `return_field()`, 
`accumlator()`, or `invoke_batch_with_args()`?).
   > 
   > If we absolutely must do it here, I think we should do it behind a feature 
flag...for all standard usage we want to error for this case to catch and fix 
dropped metadata as soon as we know it occurs.
   
   This is actually an issue in Datafusion (not Comet) issue for aggregation 
function, which impact the four ST UADFs we added before: ST_Envelope_Aggr, 
ST_Intersection_Aggr, ST_Union_Aggr, and ST_Analyze_Aggr. 
   
   In datafusion, datafusion/physical-plan/src/aggregates/mod.rs - 
PhysicalGroupBy::group_fields()
   
   ```
     Field::new(name, expr.data_type(input_schema)?,
                group_expr_nullable || expr.nullable(input_schema)?)
     .with_metadata(expr.return_field(input_schema)?.metadata().clone())
   ```
   However, in datafusion/physical-expr-common/src/physical_expr.rs, it simply 
creates a new fileds without adding the metadata from original field. See: 
https://github.com/apache/datafusion/blob/f57da83aac35f0ea4506ccb6a4ddbd26a503c1c1/datafusion/physical-expr-common/src/physical_expr.rs#L86
   
   This could be fixed by changing the Datafusion source code to check if 
expression references an input field, if so, get that field's metadata and call 
.with_metadata() to preserve it. However, I am not sure this is acceptable for 
all senarios because the aggregated fields may NOT be legit to actually inherit 
original field's metadata. For example, this is not the case for 
ST_Analyze_Aggr, which the output is not Geometry type anymore. So, not sure 
Datafusion commit will accept such changes.
   
   Anyway, I can add a feature flag to only have changes in the PR, but I don't 
see risks of not having it under a feature flag.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to