paleolimbot commented on issue #22079:
URL: https://github.com/apache/datafusion/issues/22079#issuecomment-4431486725

   Possibly the core issue is that we don't separate types and metadata (I wish 
we had/this was possible in arrow-rs!) so we had to put a FieldRef everywhere. 
Everywhere in DataFusion that I know about, that 
FieldRef-that-maybe-should-actually-have-been-a-data-type-of-some-kind's 
metadata is propagated *except* for the cast.
   
   How about:
   
   - `Expr::to_field()`reflects the `Field` metadata specified in the `Cast`
   - Whoever creates the `Expr::Cast` can propagate metadata when creating the 
cast if they feel it's safe to do so
   - The `CastExtension` I have prototyped in 
https://github.com/apache/datafusion/pull/21071 can handle this today (one can 
just add a cast extension that handles propagating specific metadata that a 
system knows about.
   
   > Would a good compromise be that arrow extension type metadata specifically 
is wiped and comes only from the target field but other arbitrary metadata is 
preserved?
   
   That would unblock my specific use of casting to a `FieldRef`. I'm also 
happy to PR this change.
   
   > I think part of the problem is that metadata can serve many purposes. 
Extension types are just one of them.
   
   Can you give some examples where metadata is communicating non-type 
information that would be useful to propagate and requires a cast and not a 
scalar function? In the PR that addresses this I can put them in a comment for 
future readers. The uses of arrow field metadata I know about are all basically 
trying to communicate type information or statistics, both of which can be 
fishy through a cast.
   
   Part of this is coming from quite a lot of previous work with GeoParquet, 
where we tried to communicate type information in Parquet metadata that was 
aggressively propagated via Arrow schema metadata and rather easily resulted in 
metadata that was capable of causing silently incorrect results.
   
   > But I don’t think it would be correct either for select col::text to strip 
arbitrary metadata
   
   Whether it's correct or not, it is behaviour that has existed in 53 versions 
of DataFusion 🙂 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to