teabot commented on issue #170: Add support for Iceberg MR / InputFormat and OutputFormat APIs URL: https://github.com/apache/incubator-iceberg/issues/170#issuecomment-594443081 Hey Colin, In the implementation described by @massdosage, we assume that Iceberg is responsible for partitioning, layout of files, etc. This allows us to model Iceberg tables as unpartitioned external Hive tables in an existing metastore. By reducing scope like this, we avoid the need to align the Hive partitioning and warehouse models with that of Iceberg. Additionally, we can have Iceberg tables coexist with existing Hive tables in the same catalogue - which is important for us as any move towards Iceberg will be an incremental migration. Thanks, Elliot. On Wed, 4 Mar 2020 at 09:07, Colin <[email protected]> wrote: > hi, @rdsr <https://github.com/rdsr> @massdosage > <https://github.com/massdosage> @rdblue <https://github.com/rdblue> > I also did some investigation on Hive integrate with Iceberg, the target > of investigation is to integrate Hive with possible plugin,eg, > HiveStorageHandler, but got some block issues: > > 1. file/folder of data has different organization, > in Hive, there is: > warehouse > |- dbname.db > |- table > |- partition > |- datafile > in Iceberg, it maybe: > catalog_base > |- metadata > |- manifest > |- datafile > It's wired to have both folder in warehouse, so I suppose iceberg > table may be existed as Hive external table, and define the metadata folder > as table location which will be used in InputFormat. > 2. Partition is also a problem. Partition is existed as a folder in > Hive, but it's "hidden partition" in Iceberg. There is no plugin like > InputFormat to solve this, and iceberg can't work properly now. For > example, "show partition", "select c1 from t where partition_col = 'p1'". > I think Hive has to be modified to deal with the partition things, eg, > "alter table add partition", "show partition", etc. > Our scene is writing Iceberg with Flink and Hive can read the data > ASAP, as an option solution, maybe create a toolkit which can transfer > IcebergTable to Hive table is suitable for our case. > Do we have the same target on Hive integration in this MR? > > — > You are receiving this because you were mentioned. > Reply to this email directly, view it on GitHub > <https://github.com/apache/incubator-iceberg/issues/170?email_source=notifications&email_token=AABX4VR3SZGPC7FALFAH25DRFYK6NA5CNFSM4HH46HTKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOENW6MXQ#issuecomment-594404958>, > or unsubscribe > <https://github.com/notifications/unsubscribe-auth/AABX4VVPNNU5M5OZ7K5IMHLRFYK6NANCNFSM4HH46HTA> > . >
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
