[
https://issues.apache.org/jira/browse/HBASE-2353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12849084#action_12849084
]
Todd Lipcon commented on HBASE-2353:
------------------------------------
I like the idea of flags to decide on when to give up correctness.
Here's a pseudocode idea that may be able to maintain correctness and speed...
just brainstorming (there may be flaws!)
{code}
def bulk_put(rows_to_write):
while not rows_to_write.empty():
minibatch = []
# all minibatches must get at least one row
row = rows_to_write.take()
row.lock()
minibatch.append(row)
# try to grab as many more locks as we can
# without blocking (prevents deadlock)
while not rows_to_write.empty():
row = rows_to_write.peek()
if row.trylock():
rows_to_write.take()
minibatch.append(row)
else:
break
# we now have locks on a number of rows
write_to_hlog(minibatch)
sync_hlog()
write_to_memstore(minibatch)
unlock_all_rows(minibatch)
{code}
Essentially the thought here is that we try to lock as many rows together as we
can without risking deadlock. This algorithm is deadlock free because we'll
never block on a lock while holding another. So in the uncontended case, this
algorithm turns into the "lock all rows, write all to hlog, sync, write all to
memstore" but in the pathological contended case it turns into a sync per row.
A couple variations are certainly available (eg loop all the way through
rows_to_write making a minibatch of anything lockable might be better), but
this general idea might be worth exploring?
> HBASE-2283 removed bulk sync optimization for multi-row puts
> ------------------------------------------------------------
>
> Key: HBASE-2353
> URL: https://issues.apache.org/jira/browse/HBASE-2353
> Project: Hadoop HBase
> Issue Type: Bug
> Reporter: ryan rawson
> Fix For: 0.21.0
>
> Attachments: HBASE-2353-deferred.txt
>
>
> previously to HBASE-2283 we used to call flush/sync once per put(Put[]) call
> (ie: batch of commits). Now we do for every row.
> This makes bulk uploads slower if you are using WAL. Is there an acceptable
> solution to achieve both safety and performance by bulk-sync'ing puts? Or
> would this not work in face of atomic guarantees?
> discuss!
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.