[
https://issues.apache.org/jira/browse/HBASE-7697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Nick Dimiduk resolved HBASE-7697.
---------------------------------
Resolution: Invalid
Closing as invalid because this is pretty vague. If you're interested, see
related mapreduce improvements in HBASE-8084.
> Consolidate tools for getting data into, out of HBase
> -----------------------------------------------------
>
> Key: HBASE-7697
> URL: https://issues.apache.org/jira/browse/HBASE-7697
> Project: HBase
> Issue Type: Improvement
> Components: Client, mapreduce
> Reporter: Nick Dimiduk
> Assignee: Nick Dimiduk
>
> The user experience for importing data into HBase and getting a dump out of
> HBase is pretty poor. The existing tools as I understand them include:
> - org.apache.hadoop.hbase.mapreduce.Export,
> - org.apache.hadoop.hbase.mapreduce.Import,
> - org.apache.hadoop.hbase.mapreduce.ImportTsv,
> - org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles, and
> - org.apache.hadoop.hbase.mapreduce.CopyTable
> Each one provides specific features that do not necessarily overlap with the
> others. For instance, Import and ImportTsv could have most of their logic
> combined, sharing common driver code and leaving the details of the
> file-format up to the user to provide via a pluggable mapper. Export and
> CopyTable both map over a target table; it's only the detail of what they do
> with the data that is different. Bulk operations via HFiles could be a more
> common use-case as well, not just a special case of ImportTsv.
> The list of [open
> issues|https://issues.apache.org/jira/issues/?filter=-1&jql=project%20%3D%20HBASE%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22%2C%20Reopened%2C%20%22Patch%20Available%22)%20AND%20text%20~%20%22ImportTsv%22%20ORDER%20BY%20updatedDate%20DESC]
> against ImportTsv alone indicates users are using the tool, and I certainly
> advise it for people getting started with a new HBase deployment.
> I propose a single interface for getting data into and out of HBase. It would
> be pluggable, allowing users to override details of their file formats and
> schemas. We can provide implementations that replicate existing tool
> behaviors as example modules. These tools are also a reasonable place, IMHO,
> to include support for creation and loading of snapshots.
> I started down the path of a specific tool intended to overcome some of the
> limitations of ImportTsv and it has since refactored into a more general
> purpose application. Initial patches forthcoming. Comments strongly
> encouraged.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira