[ https://issues.apache.org/jira/browse/HBASE-23634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17163177#comment-17163177 ]
Michael Stack commented on HBASE-23634: --------------------------------------- {quote}1、before compaction, large number of small hfiles affect read and write performance of region 2、a hfile needs 3 NN RPCs to bulkload during openRegion(validate、rename、createReader) if bulkLoadService ThreadNum is 3, and hfiles is 20(because wal number is 20), and RS is 100, region is 2K*100, and openRegion thread is 75 so hbase needs 3*3*75*100 concurrent NN RPCs and needs 3*20*2K*100 total NN RPCs {quote} >From [~Bo Cui] We can quibble with some of the assessment made above but it does suggest a better accounting is needed before we enable this as the default: * Compare recovered.edits write amplification vs that of writing small hfiles then immediately doing a rewrite via compaction (I like the [~zghao] interpretation of Bo Cui's list as opening the recovered.hfiles as part of the Region w/ the compaction bringing them into Store directory from the .tmp dir) * Replay of recovered.edits inline w/ open as opposed to just opening the file (MTTR benefits). * A compare of NN RPCs as noted above by Bo Cui. * The copy from bulkload of hfile validation is broken – for recovered hfiles and for bulk load – when recovery is for hfiles for meta table (see sub-issue) but the problem is deep-seated needing lots of work to fix. We could remove the validation since the 'system' wrote the files as [~zghao] suggests or move the validation to file open as part of open Region (could end up failing the Region open more often). One question, if we only partially write an HFile and we don't complete (because crash splitting the WAL say), does it get sidelined, cleaned up? Just wondering. Thanks. Unscheduling from 2.4 for now..... leaving against hbase3. > Enable "Split WAL to HFile" by default > -------------------------------------- > > Key: HBASE-23634 > URL: https://issues.apache.org/jira/browse/HBASE-23634 > Project: HBase > Issue Type: Task > Affects Versions: 3.0.0-alpha-1, 2.3.0 > Reporter: Guanghao Zhang > Priority: Blocker > Fix For: 3.0.0-alpha-1, 2.4.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005)