Re: [DISCUSS] APE 25: Improved Apache Iceberg support

Ian Maxon Thu, 11 Sep 2025 17:30:06 -0700

Agreed. +1 from me as well.


On Mon, Aug 11, 2025 at 3:17 PM Mike Carey <[email protected]> wrote:
>
> Great questions and discussion and clarifications.  Good-looking APE at
> this point, IMO!
>
> On 8/11/25 2:05 PM, Hari Kishore Chaparala wrote:
> > That makes sense. Thanks for the clarification, Hussain.
> >
> > On Mon, Aug 11, 2025 at 8:40 AM Hussain Towaileb<[email protected]>
> > wrote:
> >
> >> Hello Hari
> >>
> >> 1. Catalog persistence:
> >> yes, once a catalog is created, it is persisted, it is a metadata entity
> >> that is created, just like creating a collection, it's permanent. Tables
> >> will be created using on those catalogs, so unless they are stored, a
> >> customer would need to re-create it each time, which is not practical. As
> >> for the credentials part, it depends on what type of credentials they use,
> >> if it is permanent ones, then it is fine. If they use temporary credentials
> >> that expire, there are 2 types of those:
> >> - Passing keys + session token, then yes, if they expire, they will need
> >> updating, which we don't support, so they will have to re-create the
> >> catalog with the new credentials.
> >> - Passing trust account authentication, this mechanism has temporary
> >> credentials but automatically refresh, you can see this APE for more
> >> details:
> >>
> >> https://urldefense.com/v3/__https://cwiki.apache.org/confluence/display/ASTERIXDB/APE*16*3A*Cross-Account*Trust*Authentication*for*AWS*S3*External*Collections__;KyUrKysrKysrKw!!CzAuKJ42GuquVTTmVmPViYEvSg!LNqPhf1prMmZ41ZjrV2H3GMj25fFDAmeqSSCuS5SDAm6vC1r9e1nG7oUpn6NT7sgPqBndN7pCZS4HnOy0A$
> >>
> >> 2. Table referencing:
> >> This still has room for discussion, but the idea I had in mind that the
> >> name space would be in the WITH clause. This is to avoid breaking/confusing
> >> things as "tables" are actually "external collections", and if you say
> >> a.b.c, then you are talking about a "collection" in a "database".
> >> The current planned behavior (again, open for discussion)
> >> Say you create your catalog:
> >> "CREATE CATALOG myCatalog .... WITH {"namespace": "my.name.space", ...}"
> >> This would make all collections (tables) created on this catalog default to
> >> the "my.name.space" namespace by default.
> >>
> >> So if I create a collection:
> >> "CREATE EXTERNAL COLLECTION myTable ON myCatalog .... WITH {"table-name":
> >> "users", ...}"
> >>
> >> Then this table is residing at "catalog-warehouse-path/my/name/space/users"
> >>
> >> If however I would like to create a collection in a namespace other than
> >> the default, we can set that property, which will override the catalog's
> >> default namespace:
> >> "CREATE EXTERNAL COLLECTION myTable ON myCatalog .... WITH {"table-name":
> >> "users", "namespace": "my.other.name.space"}"
> >>
> >> And now this collection would be at
> >> "catalog-warehouse-path/my/other/name/space/users"
> >>
> >> 3. Splitting APE into 2:
> >> Currently, I find the two topics tightly coupled with each other, it would
> >> be more convenient to have them together for context. But if it gets too
> >> large or too confusing, I don't think there is harm in splitting them.
> >> Also note that creating a table is actually the same syntax for creating an
> >> external collection, it just takes extra property "table-type": "iceberg"
> >> to differentiate it from a normal external collection.
> >>
> >>
> >> On Mon, Aug 11, 2025 at 4:18 AM Hari Kishore Chaparala<[email protected]>
> >> wrote:
> >>
> >>> Thanks for the improvement. The syntax looks much more intuitive than in
> >>> Spark, where the catalog has to be configured with the "spark.sql" prefix
> >>> (even for DataFrame operations), which can be confusing—especially when
> >>> working with multiple catalogs.
> >>>
> >>> A few questions on the new CATALOG entity:
> >>>
> >>> 1. Catalog persistence — When we run "*CREATE CATALOG myRestCatalog*"
> >> with
> >>> configuration options, will the catalog be stored and persisted beyond
> >> the
> >>> current session? In Spark and other engines, the catalog implementation
> >> and
> >>> configuration usually last only for the active session. Since we are
> >>> querying external tables, I’m not sure if storing catalog details is
> >>> necessary. Also, AWS roles and STS credentials expire after some time,
> >>> which would require catalog updates.
> >>>
> >>> 2. Table referencing — How do we plan to reference tables? Will it be a
> >>> three-part notation -- might be clearer when working across multiple
> >>> catalogs?
> >>> For example:
> >>>
> >>>
> >>>
> >>> *SELECT *FROM glue_catalog.namespace1.iceberg_table1 AINNER JOIN
> >>> unity_catalog.namespace2.delta_lake_table1 B  ON A.id = B.id;*
> >>>
> >>> 3. It looks like this APE proposes two features: 1. The new CATALOG
> >> entity
> >>> 2. DQL and DDL support for Iceberg tables using various catalog
> >>> implementations. Would it make sense to split these into separate APEs?
> >>>
> >>> Thanks
> >>> Hari Kishore
> >>>
> >>> On Fri, Aug 8, 2025 at 9:39 AM Hussain Towaileb<[email protected]>
> >>> wrote:
> >>>
> >>>> Initiating discussion for adding improved support for Apache Iceberg
> >>>> Feature: *Improved Support for Apache Iceberg*
> >>>> Details: Apache Iceberg would provide support for reading Iceberg
> >> tables.
> >>>> This APE discusses adding improved support to the current Apache
> >> Iceberg
> >>>> support by introducing the Catalog entity to AsterixDB Metadata, adding
> >>>> support to different types of Iceberg catalogs, and introducing other
> >>>> features like time travel.
> >>>>
> >>>> APE:
> >>>>
> >>>>
> >> https://urldefense.com/v3/__https://cwiki.apache.org/confluence/display/ASTERIXDB/APE*25*3A*Apache*Iceberg*Support__;KyUrKys!!CzAuKJ42GuquVTTmVmPViYEvSg!J81u58s9FyWRyedF8qV0TL-QjZrvS9vCviVHuCte1wGJ-y3qgzG087UwfC-ii0LKkEI3c5Iw7CG6yxA62A$
> >>>> --
> >>>> Regards,
> >>>> Hussain Towaileb
> >>>>
> >>
> >> --
> >> Regards,
> >> Hussain Towaileb
> >>

Re: [DISCUSS] APE 25: Improved Apache Iceberg support

Reply via email to