[jira] Updated: (HBASE-1923) Bulk incremental load into an existing table

Todd Lipcon (JIRA) Fri, 21 May 2010 01:07:48 -0700

     [ 
https://issues.apache.org/jira/browse/HBASE-1923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Todd Lipcon updated HBASE-1923:
-------------------------------

    Attachment: hbase-1923-prelim.txt

Attaching a work-in-progress patch in case anyone wants to start looking at 
this (a number of people sounded interested, figured early review would be best)

Quick tutorial:
{code}
# Generate some data
perl -e 'for (1..10000) { print "$_\t$_\n"; }' | hadoop fs -put - mytsv.txt
perl -e 'for (1..10000) { print "$_\t" . ($_*3) . "\n"; }' | hadoop fs -put - 
mytsv2.txt

# Create table to hold it
./bin/hbase shell
create 'myfile', 'f1'

# Do a normal MR load
HADOOP_CLASSPATH=$(cat /tmp/hbase-core-test-classpath.txt) hadoop jar 
target/hbase-0.21.0-SNAPSHOT.jar importtsv -Dcolumns=f1:blah  myfile mytsv.txt

# scan in the shell if you like
# Potentially split table if you like
# Generate incremental from the other file
HADOOP_CLASSPATH=$(cat /tmp/hbase-core-test-classpath.txt) hadoop jar 
target/hbase-0.21.0-SNAPSHOT.jar importtsv -Dcolumns=f1:blah  -Duse.hfile=true 
myfile mytsv2.txt

# Load incremental
HBASE_CLASSPATH=$HADOOP_CONF_DIR  ./bin/hbase 
org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles hfof myfile

# scan in the shell and see that the data has changed
{code}

A fair amount of work remains - have to take care of regions that split, or the 
case when there are fewer reducers than regions.

> Bulk incremental load into an existing table
> --------------------------------------------
>
>                 Key: HBASE-1923
>                 URL: https://issues.apache.org/jira/browse/HBASE-1923
>             Project: Hadoop HBase
>          Issue Type: New Feature
>          Components: client, mapred, regionserver, scripts
>    Affects Versions: 0.21.0
>            Reporter: anty.rao
>            Assignee: Todd Lipcon
>         Attachments: hbase-1923-prelim.txt
>
>
> hbase-48 is about bulk load of a new table,maybe it's more practicable to 
> bulk load aganist a existing table.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HBASE-1923) Bulk incremental load into an existing table

Reply via email to