[ https://issues.apache.org/jira/browse/HBASE-22887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16936429#comment-16936429 ]
Guanghao Zhang commented on HBASE-22887: ---------------------------------------- Got it. Family F2's writer set rollRequest to true but never roll. And F1's writer always rolled. > HFileOutputFormat2 split a lot of HFile by roll once per rowkey > --------------------------------------------------------------- > > Key: HBASE-22887 > URL: https://issues.apache.org/jira/browse/HBASE-22887 > Project: HBase > Issue Type: Bug > Components: mapreduce > Affects Versions: 2.0.0 > Environment: HBase 2.0.0 > Reporter: langdamao > Priority: Major > > When I use HFileOutputFormat2 in mr job to build HFiles,in reducer it creates > lots of files. > Here is the log: > {code:java} > 2019-08-16 14:42:51,988 INFO [main] > org.apache.hadoop.hbase.mapreduce.HFileOutputFormat2: > Writer=hdfs://hfile/_temporary/1/_temporary/attempt_1558444096078_519332_r_000016_0/F1/06f3b0e9f0644ee782b7cf4469f44a70, > wrote=893827310 > Writer=hdfs://hfile/_temporary/1/_temporary/attempt_1558444096078_519332_r_000016_0/F1/1454ea148f1547499209a266ad25387f, > wrote=61 > Writer=hdfs://hfile/_temporary/1/_temporary/attempt_1558444096078_519332_r_000016_0/F1/9d35446634154b4ca4be56f361b57c8b, > wrote=55 > ... {code} > It keep writing a new file every rowkey comes. > then I output more logs for detail and found the problem. Code > Here[GitHub|[https://github.com/apache/hbase/blob/master/hbase-mapreduce/src/main/java/org/apache/hadoop/hbase/mapreduce/HFileOutputFormat2.java#L289]] > {code:java} > if (wl != null && wl.written + length >= maxsize) { > this.rollRequested = true; > } > // This can only happen once a row is finished though > if (rollRequested && Bytes.compareTo(this.previousRow, rowKey) != 0) { > rollWriters(wl); > }{code} > In my Case,I have two fimaly F1 & F2,and writer of F2 arrives the maxsize > ,so rollRequested becomes true, but it's rowkey was the same with > previousRow so writer won't be roll. When next rowkey comes with fimaly F1, > both of rollRequested && Bytes.compareTo(this.previousRow, rowKey) != 0 is > true,and writter of F1 will be roll , new Hfile create. And then same rowkey > with fimaly F2 comes set rollRequested > true, and next rowkey with fimaly F1 comes writter of F1 rolled. > So, it will create a new Hfile for every rowkey with fimaly F1, and F2 will > never be roll until job ends. > > Here is my questions and part of solutions: > Q1. First whether hbase 2.0.0 support different family of same HbaseTable has > different rowkey cut?Which means rowkeyA writes in the first HFile of F1,but > may be the second HFile of F2. For hbase 1.x.x it doesn't support so we roll > all the writter and won't get this problem. I guess the answer is > "Yes,support" , we goes to Q2. > Q2. Do we allow same rowkey with same family, comes to > HFileOutputFormat2.write? > If not, can we fix it this way, cause this rowKey will never be the same with > previouseRow > {code:java} > if (wl != null && wl.written + length >= maxsize) { > rollWriters(wl); > }{code} > If yes, should we need Map to record previouseRow > {code:java} > private final Map<byte[], byte[]> previousRows = > new TreeMap<>(Bytes.BYTES_COMPARATOR); > if (wl != null && wl.written + length >= maxsize && > Bytes.compareTo(this.previousRows.get(family), rowKey) != 0) { > rollWriters(wl); > }{code} -- This message was sent by Atlassian Jira (v8.3.4#803005)