[
https://issues.apache.org/jira/browse/HBASE-7697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13566686#comment-13566686
]
Jonathan Hsieh commented on HBASE-7697:
---------------------------------------
So are you proposing to add new framework and more code and for importing so
that this can be "world-class"? Help me understand what is in scope and out
and maybe provide some examples or a comparison. Is this basically just a new
command line program that lets you pick arbitrary input/output format
combinations?
> Consolidate tools for getting data into, out of HBase
> -----------------------------------------------------
>
> Key: HBASE-7697
> URL: https://issues.apache.org/jira/browse/HBASE-7697
> Project: HBase
> Issue Type: Improvement
> Components: Client, mapreduce
> Reporter: Nick Dimiduk
> Assignee: Nick Dimiduk
>
> The user experience for importing data into HBase and getting a dump out of
> HBase is pretty poor. The existing tools as I understand them include:
> - org.apache.hadoop.hbase.mapreduce.Export,
> - org.apache.hadoop.hbase.mapreduce.Import,
> - org.apache.hadoop.hbase.mapreduce.ImportTsv,
> - org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles, and
> - org.apache.hadoop.hbase.mapreduce.CopyTable
> Each one provides specific features that do not necessarily overlap with the
> others. For instance, Import and ImportTsv could have most of their logic
> combined, sharing common driver code and leaving the details of the
> file-format up to the user to provide via a pluggable mapper. Export and
> CopyTable both map over a target table; it's only the detail of what they do
> with the data that is different. Bulk operations via HFiles could be a more
> common use-case as well, not just a special case of ImportTsv.
> The list of [open
> issues|https://issues.apache.org/jira/issues/?filter=-1&jql=project%20%3D%20HBASE%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22%2C%20Reopened%2C%20%22Patch%20Available%22)%20AND%20text%20~%20%22ImportTsv%22%20ORDER%20BY%20updatedDate%20DESC]
> against ImportTsv alone indicates users are using the tool, and I certainly
> advise it for people getting started with a new HBase deployment.
> I propose a single interface for getting data into and out of HBase. It would
> be pluggable, allowing users to override details of their file formats and
> schemas. We can provide implementations that replicate existing tool
> behaviors as example modules. These tools are also a reasonable place, IMHO,
> to include support for creation and loading of snapshots.
> I started down the path of a specific tool intended to overcome some of the
> limitations of ImportTsv and it has since refactored into a more general
> purpose application. Initial patches forthcoming. Comments strongly
> encouraged.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira