[
https://issues.apache.org/jira/browse/HBASE-16288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Enis Soztutar updated HBASE-16288:
----------------------------------
Attachment: hbase-16288_v2.patch
I'm glad that I have written a unit test for this. There is one more case it
seems, not just the first key in a block, but we have to make sure that every
block in an intermediate level index contains at least 2 keys. Otherwise, the
recursion never stops still. The UT creates 100 blocks of such where each key
is larger than the max chunk size.
> HFile intermediate block level indexes might recurse forever creating multi
> TB files
> ------------------------------------------------------------------------------------
>
> Key: HBASE-16288
> URL: https://issues.apache.org/jira/browse/HBASE-16288
> Project: HBase
> Issue Type: Bug
> Reporter: Enis Soztutar
> Assignee: Enis Soztutar
> Priority: Critical
> Attachments: hbase-16288_v1.patch, hbase-16288_v2.patch
>
>
> Mighty [~elserj] was debugging an opentsdb cluster where some region
> directory ended up having 5TB+ files under <regiondir>/.tmp/
> Further debugging and analysis, we were able to reproduce the problem locally
> where we never we recursing in this code path for writing intermediate level
> indices:
> {code:title=HFileBlockIndex.java}
> if (curInlineChunk != null) {
> while (rootChunk.getRootSize() > maxChunkSize) {
> rootChunk = writeIntermediateLevel(out, rootChunk);
> numLevels += 1;
> }
> }
> {code}
> The problem happens if we end up with a very large rowKey (larger than
> "hfile.index.block.max.size" being the first key in the block, then moving
> all the way to the root-level index building. We will keep writing and
> building the next level of intermediate level indices with a single
> very-large key. This can happen in flush / compaction / region recovery
> causing cluster inoperability due to ever-growing files.
> Seems the issue was also reported earlier, with a temporary workaround:
> https://github.com/OpenTSDB/opentsdb/issues/490
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)