notfilippo commented on PR #12853: URL: https://github.com/apache/datafusion/pull/12853#issuecomment-2424251691
> dont remember the roots, so wondering, can we investigate and use a single type system which is Arrow Types and get rid of other types. At the end of the day it is based on Arrow Types. I can try to summarise the discussion that has happened in the last months regarding this proposal. Currently DataFusion: - Doesn’t support extension types, both in the logical sense (e.g. JSON) and in the physical sense (e.g. a logical string natively is Utf8, LargeUtf8, Utf8View but ideally a user might want to also define a special physical type like List(u8) to be logically equivalent to a string) - Has a lot of redundant code to handle logically equivalent values during logical planning (i.e. ScalarValue, function signatures) - Doesn’t support any form of runtime adaptability and assumes that all record batches as input are of the same exact schema while potentially having all the infrastructure needed to be able to support late coercion of logically equivalent values during physical execution (which seems to be a problem that comet is also looking to solve) while “arrow datatype everywhere” is definitely working for DataFusion currently, my opinion is that this is a needed step towards extensibility and it will help enterprise users looking to migrate their existing engine, custom file format and types to DataFusion, efficiently. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org