[
https://issues.apache.org/jira/browse/HBASE-2353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12848229#action_12848229
]
Kannan Muthukkaruppan commented on HBASE-2353:
----------------------------------------------
The important thing is that memstore edits should happen after append+sync.
Currently, batch put is simply a loop around append/sync/memstore-edit per put.
If we tried to move to a model, where we first do append for each row, then a
common sync, and then all the memstore changes-- then we would end up having to
"lock" all the rows for the entire duration; (rather than the current model,
which locks one row at a time.)
Also, the code structure would get uglier I think -- right now batch put pretty
much is a thin wrapper around single Puts.
This was a conscious change in HBASE-2283 for restoring correctness of
semantics. But I should have perhaps called it out explicitly.
@Ryan: Is the bulk upload case now noticeably slower?
@Andrew: You are right that this affects 0.20 also. But you might be mixing the
"group commit" and "multi-row put" terminology. Group commit should not have
been cloberred by HBASE-2283. But yes, HBASE-2283 does remove, for correctness,
the "batch sync" optimization in multi-row puts.
> HBASE-2283 removed bulk commit optimization
> -------------------------------------------
>
> Key: HBASE-2353
> URL: https://issues.apache.org/jira/browse/HBASE-2353
> Project: Hadoop HBase
> Issue Type: Bug
> Reporter: ryan rawson
> Fix For: 0.21.0
>
>
> previously to HBASE-2283 we used to call flush/sync once per put(Put[]) call
> (ie: batch of commits). Now we do for every row.
> This makes bulk uploads slower if you are using WAL. Is there an acceptable
> solution to achieve both safety and performance by bulk-sync'ing puts? Or
> would this not work in face of atomic guarantees?
> discuss!
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.