Re: HBase behaviour at startup (compression)

stack Fri, 19 Dec 2008 12:05:45 -0800

Jean-Adrien wrote:

Hello,


Andrew and St.ack, Thanks for your answers, and excuse me for confusion
between compression and compaction...

I reviewed the concept of major / minor compaction in the wiki and I looked

at both jira cases HBASE-938 / HBASE-1062.

Since I'm running hbase version 0.18.0 I certainly have the problem of
HBASE-938. If I understand well the problem, it is that at startup, all
opened regions that need compaction make a major compaction since the
timestamp of the latest major is not stored anywhere, so the (in memory)
counter is reset to the startup time, and the next major compaction will
take place (with default config) 1 day later.

I say that a major compaction runs on every restart in HBASE-938 but Iwas incorrect. Later in the issue, I recant having studied the code(The 'last' major compaction timestamp is that of the oldest file in thefilesystem).

Later in hbase-938, we hone in on the fact that even in case where thelast compaction was a major compaction, if the major compaction intervalelapses, we'd run a new major compaction. Essentially we'd rewrite datain hbase on a period (As you 'prove' later in this message w/ yourreplication check (S).

Can you tell what is running on restart? Is it a major compaction? Oradd logs of startup to an issue and I'll take a look. In 0.18.x, thereis the below if a 'major':

LOG.debug("Major compaction triggered on store: " +this.storeNameStr +

             ". Time since last major compaction: " +

((System.currentTimeMillis() - lowTimestamp)/1000) + "seconds");

The thing I'm not clear on is why on restart all the compacting? Why isa 'major' compaction triggered if we're looking at timestamp of oldestfile in filesystem. Perhaps you can add some debug emissions to figureit Jean-Adrien?

...

Here can be my problem during major compaction:
I think, (I'm not sure, I have to find better tool to monitor my network)
with my light configuration (see above for details), the problem is that
even if the compaction process is quick, for example a single modification
in a cell yield to a major compaction rewriting the whole file, since my
regionservers run on the same machine than the datanodes, they communicate
directly (fast) when RS ask to store a mapfile to DN.
Then the datanode will place replicas of the blocks on the 2 others
datanodes through the slow 100Mbit/s network. At HBase startup time, if
hadoop asks the network to transfer about 200Gb the bandwidth might be
saturated. The lease expires and the RS shut themself done. That could
explain as well the problem of max Xcievers reached sometime in the
datanodes that we disscussed in a previous post.

Above sounds plausible.

Should we cut a 0.18.2 with hbase-938 backported (includes other goodfixes too -- hbase-998, etc.).


St.Ack

Re: HBase behaviour at startup (compression)

Reply via email to