CircArgs opened a new issue #3464:
URL: https://github.com/apache/iceberg/issues/3464
I've been looking through the python legacy code as well as the new pythonic
version and I had a few questions about the python rewrite and the lack of use
of existing projects in the ecosystem.
# Types
in
https://github.com/apache/iceberg/blob/master/python/src/iceberg/types.py there
are a few primitive types whose comparison requires using the instantiated
attributes rather than python built-ins. It also seems this is still being
modeled after the Legacy version insofar as using separate types for "types"
and "literals"
https://github.com/apache/iceberg/blob/master/python_legacy/iceberg/api/types/types.py.
It seems to me it would be more natural to have a singular class for each
primitive to act as a type and literal maybe something loosely like
https://gist.github.com/CircArgs/a4571332d72888e3773f66bd180bb0e0
# Arrow?
From the types I was also looking around legacy at the filesystem and such
and overall was curious why everything was going the way of custom when pyarrow
openly implements a filesystem API
https://arrow.apache.org/docs/python/filesystems.html for s3, hdfs and local
and consistent types https://arrow.apache.org/docs/python/api/datatypes.html
which has aptly named equivalent types for each of those described in the
iceberg spec https://iceberg.apache.org/#spec/#primitive-types.
Is there some opposition to building off of Arrow which is of course another
Apache project that's well maintained and used by projects also likely to
benefit from iceberg such as Dask?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]