Re: Review Request: HBASE-1923. Bulk load into existing tables

stack Mon, 31 May 2010 09:13:02 -0700


> On 2010-05-28 17:19:22, stack wrote:
> > src/main/java/org/apache/hadoop/hbase/mapreduce/HFileOutputFormat.java, 
> > line 234
> > <http://review.hbase.org/r/87/diff/1/?file=612#file612line234>
> >
> >     Why two types?  Is this legacy?  KV has advantage of being able to 
> > carry a Delete
> 
> Todd Lipcon wrote:
>     yea, it's for compatibility with TableOutputFormat which can take either. 
> Kind of a pain, maybe we should just get rid of it and only accept KV?


Doing both is a pain yes.  If Delete and Put had a common ancestor/Interface, 
we could use that but it ain't there yet.  File an issue to undo Put option?


> On 2010-05-28 17:19:22, stack wrote:
> > src/main/java/org/apache/hadoop/hbase/mapreduce/ImportTsv.java, line 73
> > <http://review.hbase.org/r/87/diff/1/?file=613#file613line73>
> >
> >     In KV, there is a parseColumns function that might help here
> 
> Todd Lipcon wrote:
>     doesn't seem much more convenient (since I already have String here)
>     One question, though: if I want to insert into a family that has no 
> qualifiers, should I be using EMPTY_BYTE_ARRAY when I construct the KV, or 
> passing null?

It looks like either works.  It looks like nulls are handled properly.


> On 2010-05-28 17:19:22, stack wrote:
> > src/main/java/org/apache/hadoop/hbase/mapreduce/ImportTsv.java, line 96
> > <http://review.hbase.org/r/87/diff/1/?file=613#file613line96>
> >
> >     Is this safe?  Is there escaping you should be handling here?
> 
> Todd Lipcon wrote:
>     Plan to address this in docs for importtsv - it's not a good TSV parser 
> that supports quoting, etc.

ok


> On 2010-05-28 17:19:22, stack wrote:
> > src/main/java/org/apache/hadoop/hbase/mapreduce/LoadIncrementalHFiles.java, 
> > line 229
> > <http://review.hbase.org/r/87/diff/1/?file=614#file614line229>
> >
> >     I suppose in rare circumstances, it could split in here and mess you 
> > up.  You need a no-split flag per region out in zk or something?  We'd need 
> > such a thing for snapshotting methinks.  But, thats for a later...
> 
> Todd Lipcon wrote:
>     if there's been a split, the new daughter region will detect that the 
> HFile doesn't "fit" and throw WrongRegionException. This then triggers the 
> retry on ServerCallable, which fetches the new region info from meta, and 
> realizes it has to split the hfile before trying again.

Nice.


> On 2010-05-28 17:19:22, stack wrote:
> > src/main/java/org/apache/hadoop/hbase/mapreduce/hadoopbackport/InputSampler.java,
> >  line 57
> > <http://review.hbase.org/r/87/diff/1/?file=617#file617line57>
> >
> >     We need this even though you wrote out partition edges earlier when you 
> > read region boundaries?
> 
> Todd Lipcon wrote:
>     You're right that it's unused, but I figured I'd put it in for 
> convenience - when doing a bulk load into a new table, for example, you may 
> want to use this in order to figure out where to set region splits.

Ok.  Sounds good.


> On 2010-05-28 17:19:22, stack wrote:
> > src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java, line 1897
> > <http://review.hbase.org/r/87/diff/1/?file=619#file619line1897>
> >
> >     This is old cruddy stuff you are asking about.  It looks like I added 
> > splitsAndClosesLock to stop either happening during critical periods 
> > (HBASE-588) and the splits lock looks now like a half measure from way back 
> > in 2007 -- HBASE-217.  Lets make an issue to clean it up.
> 
> Todd Lipcon wrote:
>     Which one should I be taking? Do I need both? Which order?

Seems like it depends on what you need.  I like your issue on rethinking these 
locks.  Sounds like we might need to do stuff like break apart the 
splittingAndClose lock into a splitting and closing.


- stack


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://review.hbase.org/r/87/#review96
-----------------------------------------------------------


On 2010-05-31 00:22:24, Todd Lipcon wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> http://review.hbase.org/r/87/
> -----------------------------------------------------------
> 
> (Updated 2010-05-31 00:22:24)
> 
> 
> Review request for hbase, stack and Jonathan Gray.
> 
> 
> Summary
> -------
> 
> Here's a first patch that implements bulk import into existing tables. This 
> applies on top of HBASE-2586 and HBASE-2588 - I've pushed the series of the 
> three to my github: http://github.com/toddlipcon/hbase/tree/hfof-review
> 
> I have some TODOs left that I want to take care of before this gets 
> committed, but since it's a pretty large patch, I figured I'd get it out for 
> review ASAP.
> 
> The stuff in the hadoopbackport package is essentially copypaste from Hadoop 
> trunk, so you can ignore that in the review.
> 
> 
> This addresses bug HBASE-1923.
>     http://issues.apache.org/jira/browse/HBASE-1923
> 
> 
> Diffs
> -----
> 
>   pom.xml 0a009cf 
>   src/main/java/org/apache/hadoop/hbase/HRegionInfo.java 29b0cd6 
>   src/main/java/org/apache/hadoop/hbase/io/ImmutableBytesWritable.java 
> 0a9ec4b 
>   src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java b912a85 
>   src/main/java/org/apache/hadoop/hbase/ipc/HRegionInterface.java 4cbe52a 
>   src/main/java/org/apache/hadoop/hbase/mapreduce/Driver.java 3d40695 
>   src/main/java/org/apache/hadoop/hbase/mapreduce/HFileOutputFormat.java 
> 9c8e53e 
>   src/main/java/org/apache/hadoop/hbase/mapreduce/ImportTsv.java PRE-CREATION 
>   src/main/java/org/apache/hadoop/hbase/mapreduce/LoadIncrementalHFiles.java 
> PRE-CREATION 
>   src/main/java/org/apache/hadoop/hbase/mapreduce/PutSortReducer.java 
> PRE-CREATION 
>   
> src/main/java/org/apache/hadoop/hbase/mapreduce/SimpleTotalOrderPartitioner.java
>  af3d588 
>   src/main/java/org/apache/hadoop/hbase/mapreduce/TableMapReduceUtil.java 
> b332280 
>   
> src/main/java/org/apache/hadoop/hbase/mapreduce/hadoopbackport/InputSampler.java
>  PRE-CREATION 
>   
> src/main/java/org/apache/hadoop/hbase/mapreduce/hadoopbackport/TotalOrderPartitioner.java
>  PRE-CREATION 
>   src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java 287cd48 
>   src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java 
> 78f3223 
>   src/main/java/org/apache/hadoop/hbase/regionserver/Store.java 7de766d 
>   src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java 80bf09a 
>   src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java 
> c8941f1 
>   src/main/java/org/apache/hadoop/hbase/util/Bytes.java a53dafe 
>   src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java ed8709f 
>   src/test/java/org/apache/hadoop/hbase/PerformanceEvaluation.java d76c75e 
>   src/test/java/org/apache/hadoop/hbase/mapreduce/NMapInputFormat.java 
> PRE-CREATION 
>   src/test/java/org/apache/hadoop/hbase/mapreduce/TestHFileOutputFormat.java 
> d04ced2 
>   
> src/test/java/org/apache/hadoop/hbase/mapreduce/TestLoadIncrementalHFiles.java
>  PRE-CREATION 
>   src/test/java/org/apache/hadoop/hbase/regionserver/TestCompaction.java 
> f1566d3 
>   src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegionInfo.java 
> fcb22fb 
>   src/test/java/org/apache/hadoop/hbase/regionserver/TestStore.java 4595e6e 
>   src/test/java/org/apache/hadoop/hbase/regionserver/TestStoreFile.java 
> 2e4c7df 
>   src/test/java/org/apache/hadoop/hbase/util/TestBytes.java c7361cb 
> 
> Diff: http://review.hbase.org/r/87/diff
> 
> 
> Testing
> -------
> 
> Primary unit/functional testing, a bit of pseudo-distributed testing. Plan on 
> doing full system tests before commit as well.
> 
> 
> Thanks,
> 
> Todd
> 
>

Re: Review Request: HBASE-1923. Bulk load into existing tables

Reply via email to