[
https://issues.apache.org/jira/browse/HBASE-3308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12969098#action_12969098
]
HBase Review Board commented on HBASE-3308:
-------------------------------------------
Message from: "Jean-Daniel Cryans" <[email protected]>
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://review.cloudera.org/r/1273/
-----------------------------------------------------------
Review request for hbase.
Summary
-------
Patch that parallelizes the splitting of the files using ThreadPoolExecutor and
Futures. The code is a bit ugly, but does the job really well as shown during
cluster testing (which also uncovered HBASE-3318).
One new behavior this patch adds is that it's now possible to rollback a split
because it took too long to split the files. I did some testing with a timeout
of 5 secs on my cluster, even tho each machine did a few rollbacks the import
went fine. The default is 30 seconds and isn't in hbase-default.xml as I don't
think anyone would really want to change that.
This addresses bug HBASE-3308.
http://issues.apache.org/jira/browse/HBASE-3308
Diffs
-----
/branches/0.90/src/main/java/org/apache/hadoop/hbase/regionserver/SplitTransaction.java
1043188
Diff: http://review.cloudera.org/r/1273/diff
Testing
-------
Thanks,
Jean-Daniel
> SplitTransaction.splitStoreFiles slows splits a lot
> ---------------------------------------------------
>
> Key: HBASE-3308
> URL: https://issues.apache.org/jira/browse/HBASE-3308
> Project: HBase
> Issue Type: Improvement
> Reporter: Jean-Daniel Cryans
> Priority: Critical
> Fix For: 0.92.0
>
>
> Recently I've been seeing some slow splits in our production environment
> triggering timeouts, so I decided to take a closer look into the issue.
> According to my debugging, we spend almost all the time it takes to split on
> creating the reference files. Each file in my testing takes at least 300ms to
> create, and averages around 600ms. Since we create two references per store
> file, it means that a region with 4 store file can easily take up to 5
> seconds to split just to create those references.
> An intuitive improvement would be to create those files in parallel, so at
> least it wouldn't be much slower when we're splitting a higher number of
> files. Stack left the following comment in the code:
> {noformat}
> // TODO: If the below were multithreaded would we complete steps in less
> // elapsed time? St.Ack 20100920
> {noformat}
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.