[ 
https://issues.apache.org/jira/browse/HBASE-23634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17163177#comment-17163177
 ] 

Michael Stack commented on HBASE-23634:
---------------------------------------

 
{quote}1、before compaction, large number of small hfiles affect read and write 
performance of region

2、a hfile needs 3 NN RPCs to bulkload during 
openRegion(validate、rename、createReader)

if bulkLoadService ThreadNum is 3, and hfiles is 20(because wal number is 20), 
and RS is 100, region is 2K*100, and openRegion thread is 75

so hbase needs 3*3*75*100 concurrent NN RPCs and needs 3*20*2K*100 total NN RPCs
{quote}
>From [~Bo Cui]

We can quibble with some of the assessment made above but it does suggest a 
better accounting is needed before we enable this as the default:

 * Compare recovered.edits write amplification vs that of writing small hfiles 
then immediately doing a rewrite via compaction (I like the [~zghao]  
interpretation of Bo Cui's list as opening the recovered.hfiles as part of the 
Region w/ the compaction bringing them into Store directory from the .tmp dir)

 * Replay of recovered.edits inline w/ open as opposed to just opening the file 
(MTTR benefits).

 * A compare of NN RPCs as noted above by Bo Cui.

 * The copy from bulkload of hfile validation is broken – for recovered hfiles 
and for bulk load – when recovery is for hfiles for meta table (see sub-issue) 
but the problem is deep-seated needing lots of work to fix. We could remove the 
validation since the 'system' wrote the files as [~zghao] suggests or move the 
validation to file open as part of open Region (could end up failing the Region 
open more often).

One question, if we only partially write an HFile and we don't complete 
(because crash splitting the WAL say), does it get sidelined, cleaned up? Just 
wondering. Thanks.

Unscheduling from 2.4 for now..... leaving against hbase3.

> Enable "Split WAL to HFile" by default
> --------------------------------------
>
>                 Key: HBASE-23634
>                 URL: https://issues.apache.org/jira/browse/HBASE-23634
>             Project: HBase
>          Issue Type: Task
>    Affects Versions: 3.0.0-alpha-1, 2.3.0
>            Reporter: Guanghao Zhang
>            Priority: Blocker
>             Fix For: 3.0.0-alpha-1, 2.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to