[jira] Commented: (HBASE-1923) Bulk incremental load into an existing table

HBase Review Board (JIRA) Mon, 31 May 2010 09:15:24 -0700

    [ 
https://issues.apache.org/jira/browse/HBASE-1923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12873695#action_12873695
 ]

HBase Review Board commented on HBASE-1923:
-------------------------------------------

Message from: [email protected]

bq.  On 2010-05-28 17:19:22, stack wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/mapreduce/HFileOutputFormat.java, 
line 234
bq.  > <http://review.hbase.org/r/87/diff/1/?file=612#file612line234>
bq.  >
bq.  >     Why two types?  Is this legacy?  KV has advantage of being able to 
carry a Delete
bq.  
bq.  Todd Lipcon wrote:
bq.      yea, it's for compatibility with TableOutputFormat which can take 
either. Kind of a pain, maybe we should just get rid of it and only accept KV?

Doing both is a pain yes.  If Delete and Put had a common ancestor/Interface, 
we could use that but it ain't there yet.  File an issue to undo Put option?

bq.  On 2010-05-28 17:19:22, stack wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/mapreduce/ImportTsv.java, line 73
bq.  > <http://review.hbase.org/r/87/diff/1/?file=613#file613line73>
bq.  >
bq.  >     In KV, there is a parseColumns function that might help here
bq.  
bq.  Todd Lipcon wrote:
bq.      doesn't seem much more convenient (since I already have String here)
bq.      One question, though: if I want to insert into a family that has no 
qualifiers, should I be using EMPTY_BYTE_ARRAY when I construct the KV, or 
passing null?

It looks like either works.  It looks like nulls are handled properly.

bq.  On 2010-05-28 17:19:22, stack wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/mapreduce/ImportTsv.java, line 96
bq.  > <http://review.hbase.org/r/87/diff/1/?file=613#file613line96>
bq.  >
bq.  >     Is this safe?  Is there escaping you should be handling here?
bq.  
bq.  Todd Lipcon wrote:
bq.      Plan to address this in docs for importtsv - it's not a good TSV 
parser that supports quoting, etc.

ok

bq.  On 2010-05-28 17:19:22, stack wrote:
bq.  > 
src/main/java/org/apache/hadoop/hbase/mapreduce/LoadIncrementalHFiles.java, 
line 229
bq.  > <http://review.hbase.org/r/87/diff/1/?file=614#file614line229>
bq.  >
bq.  >     I suppose in rare circumstances, it could split in here and mess you 
up.  You need a no-split flag per region out in zk or something?  We'd need 
such a thing for snapshotting methinks.  But, thats for a later...
bq.  
bq.  Todd Lipcon wrote:
bq.      if there's been a split, the new daughter region will detect that the 
HFile doesn't "fit" and throw WrongRegionException. This then triggers the 
retry on ServerCallable, which fetches the new region info from meta, and 
realizes it has to split the hfile before trying again.

Nice.

bq.  On 2010-05-28 17:19:22, stack wrote:
bq.  > 
src/main/java/org/apache/hadoop/hbase/mapreduce/hadoopbackport/InputSampler.java,
 line 57
bq.  > <http://review.hbase.org/r/87/diff/1/?file=617#file617line57>
bq.  >
bq.  >     We need this even though you wrote out partition edges earlier when 
you read region boundaries?
bq.  
bq.  Todd Lipcon wrote:
bq.      You're right that it's unused, but I figured I'd put it in for 
convenience - when doing a bulk load into a new table, for example, you may 
want to use this in order to figure out where to set region splits.

Ok.  Sounds good.

bq.  On 2010-05-28 17:19:22, stack wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java, line 
1897
bq.  > <http://review.hbase.org/r/87/diff/1/?file=619#file619line1897>
bq.  >
bq.  >     This is old cruddy stuff you are asking about.  It looks like I 
added splitsAndClosesLock to stop either happening during critical periods 
(HBASE-588) and the splits lock looks now like a half measure from way back in 
2007 -- HBASE-217.  Lets make an issue to clean it up.
bq.  
bq.  Todd Lipcon wrote:
bq.      Which one should I be taking? Do I need both? Which order?

Seems like it depends on what you need.  I like your issue on rethinking these 
locks.  Sounds like we might need to do stuff like break apart the 
splittingAndClose lock into a splitting and closing.

- stack

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://review.hbase.org/r/87/#review96
-----------------------------------------------------------

> Bulk incremental load into an existing table
> --------------------------------------------
>
>                 Key: HBASE-1923
>                 URL: https://issues.apache.org/jira/browse/HBASE-1923
>             Project: HBase
>          Issue Type: New Feature
>          Components: client, mapred, regionserver, scripts
>    Affects Versions: 0.21.0
>            Reporter: anty.rao
>            Assignee: Todd Lipcon
>         Attachments: hbase-1923-prelim.txt, hbase-1923.txt
>
>
> hbase-48 is about bulk load of a new table,maybe it's more practicable to 
> bulk load aganist a existing table.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-1923) Bulk incremental load into an existing table

Reply via email to