[
https://issues.apache.org/jira/browse/HBASE-13985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14605199#comment-14605199
]
Victor Xu commented on HBASE-13985:
-----------------------------------
Thanks, Ted. I'll add a new patch.
I bulkloaded nearly 2 million hfiles into one HTable last Saturday, and I
waited for more than 30 mins and it still block by validation of HFile format.
So I added this configuration to skip this logic. Finally, the whole bulkload
process completed in 15 mins.
A small test shows that HFile format validation speed could be 350/sec in
single thread, so checking 3.5 million hfiles needs several hours. Even though
multi-threads could speed up this process, I prefer to add a configuration to
skip the whole logic completely.
> Add configuration to skip validating HFile format when bulk loading millions
> of HFiles
> --------------------------------------------------------------------------------------
>
> Key: HBASE-13985
> URL: https://issues.apache.org/jira/browse/HBASE-13985
> Project: HBase
> Issue Type: Improvement
> Affects Versions: 0.98.13
> Reporter: Victor Xu
> Assignee: Victor Xu
> Priority: Minor
> Labels: regionserver
> Fix For: 0.98.14
>
> Attachments: HBASE-13985.patch
>
>
> When bulk loading millions of HFile into one HTable, checking HFile format is
> the most time-consuming phase. Maybe we could use a parallel mechanism to
> increase the speed, but when it comes to millions of HFiles, it may still
> cost dozens of minutes. So I think it's necessary to add an option for
> advanced user to bulkload without checking HFile format at all.
> Of course, the default value of this option should be true.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)