[ 
https://issues.apache.org/jira/browse/HBASE-22887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16936429#comment-16936429
 ] 

Guanghao Zhang commented on HBASE-22887:
----------------------------------------

Got it. Family F2's writer set rollRequest to true but never roll. And F1's 
writer always rolled.

> HFileOutputFormat2 split a lot of HFile by roll once per rowkey
> ---------------------------------------------------------------
>
>                 Key: HBASE-22887
>                 URL: https://issues.apache.org/jira/browse/HBASE-22887
>             Project: HBase
>          Issue Type: Bug
>          Components: mapreduce
>    Affects Versions: 2.0.0
>         Environment: HBase 2.0.0
>            Reporter: langdamao
>            Priority: Major
>
> When I use HFileOutputFormat2 in mr job to build HFiles,in reducer it creates 
> lots of files.
> Here is the log:
> {code:java}
> 2019-08-16 14:42:51,988 INFO [main] 
> org.apache.hadoop.hbase.mapreduce.HFileOutputFormat2: 
> Writer=hdfs://hfile/_temporary/1/_temporary/attempt_1558444096078_519332_r_000016_0/F1/06f3b0e9f0644ee782b7cf4469f44a70,
>  wrote=893827310 
> Writer=hdfs://hfile/_temporary/1/_temporary/attempt_1558444096078_519332_r_000016_0/F1/1454ea148f1547499209a266ad25387f,
>  wrote=61 
> Writer=hdfs://hfile/_temporary/1/_temporary/attempt_1558444096078_519332_r_000016_0/F1/9d35446634154b4ca4be56f361b57c8b,
>  wrote=55 
> ...  {code}
> It keep writing a new file every rowkey comes.
> then I output more logs for detail and found the problem. Code 
> Here[GitHub|[https://github.com/apache/hbase/blob/master/hbase-mapreduce/src/main/java/org/apache/hadoop/hbase/mapreduce/HFileOutputFormat2.java#L289]]
> {code:java}
> if (wl != null && wl.written + length >= maxsize) {
>   this.rollRequested = true;
> }
> // This can only happen once a row is finished though
> if (rollRequested && Bytes.compareTo(this.previousRow, rowKey) != 0) {
>   rollWriters(wl);
> }{code}
> In my Case,I have two fimaly F1 & F2,and writer of F2 arrives the maxsize
>  ,so rollRequested becomes true, but it's rowkey was the same with 
> previousRow so writer won't be roll. When next rowkey comes with fimaly F1, 
> both of rollRequested && Bytes.compareTo(this.previousRow, rowKey) != 0 is 
> true,and writter of F1 will be roll , new Hfile create. And then same rowkey 
> with fimaly F2 comes set rollRequested
>  true, and next rowkey with fimaly F1 comes writter of F1 rolled. 
> So, it will create a new Hfile for every rowkey with fimaly F1, and F2 will 
> never be roll until job ends.
>  
> Here is my questions and part of solutions:
> Q1. First whether hbase 2.0.0 support different family of same HbaseTable has 
> different rowkey cut?Which means rowkeyA writes in the first HFile of F1,but 
> may be the second HFile of F2. For hbase 1.x.x it doesn't support so we roll 
> all the writter and won't get this problem. I guess the answer is 
> "Yes,support" , we goes to Q2.
> Q2. Do we allow same rowkey with same family, comes to 
> HFileOutputFormat2.write?
> If not, can we fix it this way, cause this rowKey will never be the same with 
> previouseRow
> {code:java}
>  if (wl != null && wl.written + length >= maxsize) { 
>       rollWriters(wl);
>  }{code}
> If yes, should we need Map to record previouseRow
> {code:java}
> private final Map<byte[], byte[]> previousRows =
>         new TreeMap<>(Bytes.BYTES_COMPARATOR);
> if (wl != null && wl.written + length >= maxsize && 
> Bytes.compareTo(this.previousRows.get(family), rowKey) != 0) { 
>      rollWriters(wl); 
> }{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to