[
https://issues.apache.org/jira/browse/HBASE-22887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16913056#comment-16913056
]
langdamao edited comment on HBASE-22887 at 8/22/19 7:11 AM:
------------------------------------------------------------
[~anoop.hbase] Thank you very much for reply :)
Q1. Greate!
Q2. Yes, we don't need rollRequested now. We can record the previousRow for
each family writer using map, detail in Code A1. Or even we can make
previousRow be a var of writer, detail in Code A2.
Code A1.
{code:java}
private final Map<byte[], byte[]> previousRows = new
TreeMap<>(Bytes.BYTES_COMPARATOR);
...
if (wl != null && wl.written + length >= maxsize &&
Bytes.compareTo(this.previousRows.get(family), rowKey) != 0) {
rollWriters(wl);
}
...
previousRows.put(family,rowKey);
{code}
Code A2.
{code:java}
static class WriterLength {
long written = 0;
byte[] previousRow = HConstants.EMPTY_BYTE_ARRAY;
StoreFileWriter writer = null;
}
...
if (wl != null && wl.written + length >= maxsize &&
Bytes.compareTo(wl.previousRow, rowKey) != 0) {
rollWriters(wl);
}
...
wl.previousRows = rowKey;
{code}
was (Author: langdamao):
[~anoop.hbase] Thank you very much for reply :)
Q1. Greate!
Q2. Yes, we don't need rollRequested now. We can record the previousRow for
each family writer using map detail In Code A1. Or even we can make
previousRow be a var of writer detail in Code A2.
Code A1.
private final Map<byte[], byte[]> previousRows = new
TreeMap<>(Bytes.BYTES_COMPARATOR);
...
if (wl != null && wl.written + length >= maxsize &&
Bytes.compareTo(this.previousRows.get(family), rowKey) != 0) \{
rollWriters(wl);
}
...
previousRows.put(family,rowKey);
Code A2.
{code:java}
static class WriterLength {
long written = 0;
byte[] previousRow = HConstants.EMPTY_BYTE_ARRAY;
StoreFileWriter writer = null;
}
...
if (wl != null && wl.written + length >= maxsize &&
Bytes.compareTo(wl.previousRow, rowKey) != 0) {
rollWriters(wl);
}
...
wl.previousRows = rowKey;
{code}
> HFileOutputFormat2 split a lot of HFile by roll once per rowkey
> ---------------------------------------------------------------
>
> Key: HBASE-22887
> URL: https://issues.apache.org/jira/browse/HBASE-22887
> Project: HBase
> Issue Type: Bug
> Components: mapreduce
> Affects Versions: 2.0.0
> Environment: HBase 2.0.0
> Reporter: langdamao
> Priority: Major
>
> When I use HFileOutputFormat2 in mr job to build HFiles,in reducer it creates
> lots of files.
> Here is the log:
> {code:java}
> 2019-08-16 14:42:51,988 INFO [main]
> org.apache.hadoop.hbase.mapreduce.HFileOutputFormat2:
> Writer=hdfs://hfile/_temporary/1/_temporary/attempt_1558444096078_519332_r_000016_0/F1/06f3b0e9f0644ee782b7cf4469f44a70,
> wrote=893827310
> Writer=hdfs://hfile/_temporary/1/_temporary/attempt_1558444096078_519332_r_000016_0/F1/1454ea148f1547499209a266ad25387f,
> wrote=61
> Writer=hdfs://hfile/_temporary/1/_temporary/attempt_1558444096078_519332_r_000016_0/F1/9d35446634154b4ca4be56f361b57c8b,
> wrote=55
> ... {code}
> It keep writing a new file every rowkey comes.
> then I output more logs for detail and found the problem. Code
> Here[GitHub|[https://github.com/apache/hbase/blob/master/hbase-mapreduce/src/main/java/org/apache/hadoop/hbase/mapreduce/HFileOutputFormat2.java#L289]]
> {code:java}
> if (wl != null && wl.written + length >= maxsize) {
> this.rollRequested = true;
> }
> // This can only happen once a row is finished though
> if (rollRequested && Bytes.compareTo(this.previousRow, rowKey) != 0) {
> rollWriters(wl);
> }{code}
> In my Case,I have two fimaly F1 & F2,and writer of F2 arrives the maxsize
> ,so rollRequested becomes true, but it's rowkey was the same with
> previousRow so writer won't be roll. When next rowkey comes with fimaly F1,
> both of rollRequested && Bytes.compareTo(this.previousRow, rowKey) != 0 is
> true,and writter of F1 will be roll , new Hfile create. And then same rowkey
> with fimaly F2 comes set rollRequested
> true, and next rowkey with fimaly F1 comes writter of F1 rolled.
> So, it will create a new Hfile for every rowkey with fimaly F1, and F2 will
> never be roll until job ends.
>
> Here is my questions and part of solutions:
> Q1. First whether hbase 2.0.0 support different family of same HbaseTable has
> different rowkey cut?Which means rowkeyA writes in the first HFile of F1,but
> may be the second HFile of F2. For hbase 1.x.x it doesn't support so we roll
> all the writter and won't get this problem. I guess the answer is
> "Yes,support" , we goes to Q2.
> Q2. Do we allow same rowkey with same family, comes to
> HFileOutputFormat2.write?
> If not, can we fix it this way, cause this rowKey will never be the same with
> previouseRow
> {code:java}
> if (wl != null && wl.written + length >= maxsize) {
> rollWriters(wl);
> }{code}
> If yes, should we need Map to record previouseRow
> {code:java}
> private final Map<byte[], byte[]> previousRows =
> new TreeMap<>(Bytes.BYTES_COMPARATOR);
> if (wl != null && wl.written + length >= maxsize &&
> Bytes.compareTo(this.previousRows.get(family), rowKey) != 0) {
> rollWriters(wl);
> }{code}
--
This message was sent by Atlassian Jira
(v8.3.2#803003)