[
https://issues.apache.org/jira/browse/HIVE-11389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Alan Gates updated HIVE-11389:
------------------------------
Attachment: HIVE-11389.patch
I basically rewrote HBaseImport.
The user can now pick selected objects to import, including role(s),
database(s), table(s), function(s), and kerberos related items (master key and
tokens). Importing an object imports all contained objects. That is, if you
import a database you will get all of the tables and functions in that
database. If the user wishes to import just some items in the database he can
create the database on the hbase side and then import the desired tables and
functions.
There is no option to import just some partitions, as that seemed confusing.
I also completely changed the way tables and partitions are copied. In the
past these were done one by one.
Now, for tables the importer builds a list of all tables to import based on
database it imported and any user requested tables. It then spawns threads to
fetch the table definitions from the RDBMS and write them to HBase in parallel.
For partitions, it in parallel fetches all partition names and breaks them into
batches of at most 1000 (configurable). Separate threads then handle fetching
the partitions as a batch and writing them as a batch to HBase. This solves a
couple of problems versus the previous code: 1) we are no longer depending on
being able to instantiate all partitions for a table in memory simultaneously;
2) we are no longer adding partitions one by one; rather a batch of 1000 is
read and then written with one call each.
The parallelism can be set by the user and defaults to 1.
> hbase import should allow partial imports and should work in parallel
> ---------------------------------------------------------------------
>
> Key: HIVE-11389
> URL: https://issues.apache.org/jira/browse/HIVE-11389
> Project: Hive
> Issue Type: Improvement
> Components: HBase Metastore
> Affects Versions: hbase-metastore-branch
> Reporter: Alan Gates
> Assignee: Alan Gates
> Attachments: HIVE-11389.patch
>
>
> Currently the hbaseimport tool always imports a whole metastore serially.
> This has a couple of issues. One, users may wish to import only certain
> parts of their metastore. Two, when there are tables with many partitions it
> can take a long time.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)