Re: Review Request: HBASE-1923. Bulk load into existing tables

Todd Lipcon Sun, 30 May 2010 14:13:57 -0700


> On 2010-05-28 17:19:22, stack wrote:
> > src/main/java/org/apache/hadoop/hbase/ipc/HRegionInterface.java, line 264
> > <http://review.hbase.org/r/87/diff/1/?file=610#file610line264>
> >
> >     Is this patch for trunk or branch?  If branch, this addition might 
> > break RPC.   It might be ok on the end here but we should test.


I will be backporting to branch for our customer, but as a major new feature I 
didn't expect it would be committed to Apache branch. If people want it, 
though, I'll put the patch up.

In general, adding a new RPC doesn't cause incompatibility - if a client calls 
it and it's not there on the server side, the client will just get an exception 
that the method doesn't exist. (eg see my backport of HDFS-630 to hadoop 20)


> On 2010-05-28 17:19:22, stack wrote:
> > src/main/java/org/apache/hadoop/hbase/mapreduce/HFileOutputFormat.java, 
> > line 181
> > <http://review.hbase.org/r/87/diff/1/?file=612#file612line181>
> >
> >     Man, we never enable asserts.... throw an exception!
> >

Switched to using google Preconditions


> On 2010-05-28 17:19:22, stack wrote:
> > src/main/java/org/apache/hadoop/hbase/mapreduce/HFileOutputFormat.java, 
> > line 187
> > <http://review.hbase.org/r/87/diff/1/?file=612#file612line187>
> >
> >     Just asking, the comparator for sure is going to do the right thing 
> > here?

yea, ImmutableBytesWritable implements Comparable


> On 2010-05-28 17:19:22, stack wrote:
> > src/main/java/org/apache/hadoop/hbase/mapreduce/HFileOutputFormat.java, 
> > line 213
> > <http://review.hbase.org/r/87/diff/1/?file=612#file612line213>
> >
> >     Won't javadoc mangle your nice formatting here?  Use ul/li or wrap w/ 
> > <pre>?

good call, fixed to ul/li


> On 2010-05-28 17:19:22, stack wrote:
> > src/main/java/org/apache/hadoop/hbase/mapreduce/HFileOutputFormat.java, 
> > line 234
> > <http://review.hbase.org/r/87/diff/1/?file=612#file612line234>
> >
> >     Why two types?  Is this legacy?  KV has advantage of being able to 
> > carry a Delete

yea, it's for compatibility with TableOutputFormat which can take either. Kind 
of a pain, maybe we should just get rid of it and only accept KV?


> On 2010-05-28 17:19:22, stack wrote:
> > src/main/java/org/apache/hadoop/hbase/mapreduce/ImportTsv.java, line 73
> > <http://review.hbase.org/r/87/diff/1/?file=613#file613line73>
> >
> >     In KV, there is a parseColumns function that might help here

doesn't seem much more convenient (since I already have String here)
One question, though: if I want to insert into a family that has no qualifiers, 
should I be using EMPTY_BYTE_ARRAY when I construct the KV, or passing null?


> On 2010-05-28 17:19:22, stack wrote:
> > src/main/java/org/apache/hadoop/hbase/mapreduce/ImportTsv.java, line 96
> > <http://review.hbase.org/r/87/diff/1/?file=613#file613line96>
> >
> >     Is this safe?  Is there escaping you should be handling here?

Plan to address this in docs for importtsv - it's not a good TSV parser that 
supports quoting, etc.


> On 2010-05-28 17:19:22, stack wrote:
> > src/main/java/org/apache/hadoop/hbase/mapreduce/LoadIncrementalHFiles.java, 
> > line 229
> > <http://review.hbase.org/r/87/diff/1/?file=614#file614line229>
> >
> >     I suppose in rare circumstances, it could split in here and mess you 
> > up.  You need a no-split flag per region out in zk or something?  We'd need 
> > such a thing for snapshotting methinks.  But, thats for a later...

if there's been a split, the new daughter region will detect that the HFile 
doesn't "fit" and throw WrongRegionException. This then triggers the retry on 
ServerCallable, which fetches the new region info from meta, and realizes it 
has to split the hfile before trying again.


> On 2010-05-28 17:19:22, stack wrote:
> > src/main/java/org/apache/hadoop/hbase/mapreduce/PutSortReducer.java, line 29
> > <http://review.hbase.org/r/87/diff/1/?file=615#file615line29>
> >
> >     Class comment.  Whats this thing do?

fixed - copied from KeyValueSortReducer


> On 2010-05-28 17:19:22, stack wrote:
> > src/main/java/org/apache/hadoop/hbase/mapreduce/hadoopbackport/InputSampler.java,
> >  line 57
> > <http://review.hbase.org/r/87/diff/1/?file=617#file617line57>
> >
> >     We need this even though you wrote out partition edges earlier when you 
> > read region boundaries?

You're right that it's unused, but I figured I'd put it in for convenience - 
when doing a bulk load into a new table, for example, you may want to use this 
in order to figure out where to set region splits.


> On 2010-05-28 17:19:22, stack wrote:
> > src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java, line 1897
> > <http://review.hbase.org/r/87/diff/1/?file=619#file619line1897>
> >
> >     This is old cruddy stuff you are asking about.  It looks like I added 
> > splitsAndClosesLock to stop either happening during critical periods 
> > (HBASE-588) and the splits lock looks now like a half measure from way back 
> > in 2007 -- HBASE-217.  Lets make an issue to clean it up.

Which one should I be taking? Do I need both? Which order?


> On 2010-05-28 17:19:22, stack wrote:
> > src/main/java/org/apache/hadoop/hbase/regionserver/Store.java, line 511
> > <http://review.hbase.org/r/87/diff/1/?file=621#file621line511>
> >
> >     There is FSUtils in hbase util

On second thought actually this should be somewhat store-specific, so took out 
the TODO


> On 2010-05-28 17:19:22, stack wrote:
> > src/main/java/org/apache/hadoop/hbase/util/Bytes.java, line 1105
> > <http://review.hbase.org/r/87/diff/1/?file=622#file622line1105>
> >
> >     You were going to remove this

fixed


> On 2010-05-28 17:19:22, stack wrote:
> > src/main/java/org/apache/hadoop/hbase/regionserver/Store.java, line 535
> > <http://review.hbase.org/r/87/diff/1/?file=621#file621line535>
> >
> >     What you going to do here?  Get the last key from the map of storefiles 
> > in current region and make your file be one less than it?  Your timestamp 
> > is current though so edits in here could overshadow those in another 
> > storefile.  Thats ok?

going to make storefiles be a list instead of map... new review coming in a bit


> On 2010-05-28 17:19:22, stack wrote:
> > src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java, line 646
> > <http://review.hbase.org/r/87/diff/1/?file=623#file623line646>
> >
> >     Whats this about?

added comment - idea is to restore back to local-mode job runner when cluster 
shuts down


> On 2010-05-28 17:19:22, stack wrote:
> > src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java, line 784
> > <http://review.hbase.org/r/87/diff/1/?file=623#file623line784>
> >
> >     Its not removing it itself?  Its registered to cleanup on exit.

was seeing cases where in between different tests in the same test case, it 
wouldn't get cleaned up, etc.


> On 2010-05-28 17:19:22, stack wrote:
> > src/test/java/org/apache/hadoop/hbase/mapreduce/NMapInputFormat.java, line 
> > 40
> > <http://review.hbase.org/r/87/diff/1/?file=624#file624line40>
> >
> >     How?  If nullwritables, don't I just get nulls?

Changed to:

/**
 * Input format that creates as many map tasks as configured in
 * mapred.map.tasks, each provided with a single row of
 * NullWritables. This can be useful when trying to write mappers
 * which don't have any real input (eg when the mapper is simply
 * producing random data as output)
 */


- Todd


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://review.hbase.org/r/87/#review96
-----------------------------------------------------------


On 2010-05-25 14:21:16, Todd Lipcon wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> http://review.hbase.org/r/87/
> -----------------------------------------------------------
> 
> (Updated 2010-05-25 14:21:16)
> 
> 
> Review request for hbase, stack and Jonathan Gray.
> 
> 
> Summary
> -------
> 
> Here's a first patch that implements bulk import into existing tables. This 
> applies on top of HBASE-2586 and HBASE-2588 - I've pushed the series of the 
> three to my github: http://github.com/toddlipcon/hbase/tree/hfof-review
> 
> I have some TODOs left that I want to take care of before this gets 
> committed, but since it's a pretty large patch, I figured I'd get it out for 
> review ASAP.
> 
> The stuff in the hadoopbackport package is essentially copypaste from Hadoop 
> trunk, so you can ignore that in the review.
> 
> 
> This addresses bug HBASE-1923.
>     http://issues.apache.org/jira/browse/HBASE-1923
> 
> 
> Diffs
> -----
> 
>   pom.xml 0a009cf 
>   src/main/java/org/apache/hadoop/hbase/HRegionInfo.java 29b0cd6 
>   src/main/java/org/apache/hadoop/hbase/io/ImmutableBytesWritable.java 
> 0a9ec4b 
>   src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java cf4768f 
>   src/main/java/org/apache/hadoop/hbase/ipc/HRegionInterface.java 4cbe52a 
>   src/main/java/org/apache/hadoop/hbase/mapreduce/Driver.java 3d40695 
>   src/main/java/org/apache/hadoop/hbase/mapreduce/HFileOutputFormat.java 
> 9c8e53e 
>   src/main/java/org/apache/hadoop/hbase/mapreduce/ImportTsv.java PRE-CREATION 
>   src/main/java/org/apache/hadoop/hbase/mapreduce/LoadIncrementalHFiles.java 
> PRE-CREATION 
>   src/main/java/org/apache/hadoop/hbase/mapreduce/PutSortReducer.java 
> PRE-CREATION 
>   
> src/main/java/org/apache/hadoop/hbase/mapreduce/SimpleTotalOrderPartitioner.java
>  af3d588 
>   
> src/main/java/org/apache/hadoop/hbase/mapreduce/hadoopbackport/InputSampler.java
>  PRE-CREATION 
>   
> src/main/java/org/apache/hadoop/hbase/mapreduce/hadoopbackport/TotalOrderPartitioner.java
>  PRE-CREATION 
>   src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java 287cd48 
>   src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java 
> fb65ed1 
>   src/main/java/org/apache/hadoop/hbase/regionserver/Store.java 7de766d 
>   src/main/java/org/apache/hadoop/hbase/util/Bytes.java a53dafe 
>   src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java ed8709f 
>   src/test/java/org/apache/hadoop/hbase/mapreduce/NMapInputFormat.java 
> PRE-CREATION 
>   src/test/java/org/apache/hadoop/hbase/mapreduce/TestHFileOutputFormat.java 
> d04ced2 
>   
> src/test/java/org/apache/hadoop/hbase/mapreduce/TestLoadIncrementalHFiles.java
>  PRE-CREATION 
>   src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegionInfo.java 
> fcb22fb 
> 
> Diff: http://review.hbase.org/r/87/diff
> 
> 
> Testing
> -------
> 
> Primary unit/functional testing, a bit of pseudo-distributed testing. Plan on 
> doing full system tests before commit as well.
> 
> 
> Thanks,
> 
> Todd
> 
>

Re: Review Request: HBASE-1923. Bulk load into existing tables

Reply via email to