[ 
https://issues.apache.org/jira/browse/HBASE-2353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12849081#action_12849081
 ] 

stack commented on HBASE-2353:
------------------------------

/me sorry, late to the game

Correctness must be an option, if not the way we ship by default.  We can 
choose to not ship it as default but hbase has to be able to be correct (I'm 
thinking that default we might ship with deferred logging if it improves our 
speed some but we must be clear to user about the cost to them of not syncing 
each row mutation).

Bulk put is a relatively new feature.  Its addition made for some nice upload 
numbers but our write speed before bulk put had been fine, at least compared to 
the competition (See Y! paper).

What if you added flags to the bulk put Ryan that allowed you ask for a 
"sloppy" bulk put behavior for those of us who are fine redoing the bulk put if 
it doesn't all go in.  The sloppy bulk put would write all to the WAL first 
(you might look at making yourself a special version of WALEdit and HLogKey for 
this case... would need special handling at split time, etc.), sync, and then 
do he memstore update (the latter could be done by calling single-row put with 
WAL set to false) w/o locking (or write the WAL afterward... though I think 
writing it first better).  By default the bulk put would run row-by-row, WAL, 
sync, memstore-update.

> HBASE-2283 removed bulk sync optimization for multi-row puts
> ------------------------------------------------------------
>
>                 Key: HBASE-2353
>                 URL: https://issues.apache.org/jira/browse/HBASE-2353
>             Project: Hadoop HBase
>          Issue Type: Bug
>            Reporter: ryan rawson
>             Fix For: 0.21.0
>
>         Attachments: HBASE-2353-deferred.txt
>
>
> previously to HBASE-2283 we used to call flush/sync once per put(Put[]) call 
> (ie: batch of commits).  Now we do for every row.  
> This makes bulk uploads slower if you are using WAL.  Is there an acceptable 
> solution to achieve both safety and performance by bulk-sync'ing puts?  Or 
> would this not work in face of atomic guarantees?
> discuss!

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to