[GitHub] [incubator-iceberg] teabot commented on issue #170: Add support for Iceberg MR / InputFormat and OutputFormat APIs

GitBox Wed, 04 Mar 2020 02:32:22 -0800

teabot commented on issue #170: Add support for Iceberg MR / InputFormat and 
OutputFormat APIs
URL: 
https://github.com/apache/incubator-iceberg/issues/170#issuecomment-594443081
 
 
   Hey Colin,
   
   In the implementation described by @massdosage, we assume that Iceberg is
   responsible for partitioning, layout of files, etc. This allows us to model
   Iceberg tables as unpartitioned external Hive tables in an existing
   metastore. By reducing scope like this, we avoid the need to align the Hive
   partitioning and warehouse models with that of Iceberg. Additionally, we
   can have Iceberg tables coexist with existing Hive tables in the same
   catalogue - which is important for us as any move towards Iceberg will be
   an incremental migration.
   
   Thanks,
   
   Elliot.
   
   On Wed, 4 Mar 2020 at 09:07, Colin <[email protected]> wrote:
   
   > hi, @rdsr <https://github.com/rdsr> @massdosage
   > <https://github.com/massdosage> @rdblue <https://github.com/rdblue>
   > I also did some investigation on Hive integrate with Iceberg, the target
   > of investigation is to integrate Hive with possible plugin,eg,
   > HiveStorageHandler, but got some block issues:
   >
   >    1. file/folder of data has different organization,
   >    in Hive, there is:
   >    warehouse
   >    |- dbname.db
   >    |- table
   >    |- partition
   >    |- datafile
   >    in Iceberg, it maybe:
   >    catalog_base
   >    |- metadata
   >    |- manifest
   >    |- datafile
   >    It's wired to have both folder in warehouse, so I suppose iceberg
   >    table may be existed as Hive external table, and define the metadata 
folder
   >    as table location which will be used in InputFormat.
   >    2. Partition is also a problem. Partition is existed as a folder in
   >    Hive, but it's "hidden partition" in Iceberg. There is no plugin like
   >    InputFormat to solve this, and iceberg can't work properly now. For
   >    example, "show partition", "select c1 from t where partition_col = 
'p1'".
   >    I think Hive has to be modified to deal with the partition things, eg,
   >    "alter table add partition", "show partition", etc.
   >    Our scene is writing Iceberg with Flink and Hive can read the data
   >    ASAP, as an option solution, maybe create a toolkit which can transfer
   >    IcebergTable to Hive table is suitable for our case.
   >    Do we have the same target on Hive integration in this MR?
   >
   > —
   > You are receiving this because you were mentioned.
   > Reply to this email directly, view it on GitHub
   > 
<https://github.com/apache/incubator-iceberg/issues/170?email_source=notifications&email_token=AABX4VR3SZGPC7FALFAH25DRFYK6NA5CNFSM4HH46HTKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOENW6MXQ#issuecomment-594404958>,
   > or unsubscribe
   > 
<https://github.com/notifications/unsubscribe-auth/AABX4VVPNNU5M5OZ7K5IMHLRFYK6NANCNFSM4HH46HTA>
   > .
   >


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [incubator-iceberg] teabot commented on issue #170: Add support for Iceberg MR / InputFormat and OutputFormat APIs

Reply via email to