[ 
https://issues.apache.org/jira/browse/HBASE-2294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12845736#action_12845736
 ] 

Todd Lipcon commented on HBASE-2294:
------------------------------------

bq. IMHO having the scanner stay 'up to date' as much as possible is a 
nice-to-have, definitely not important enough to hurt performance.

I think I agree with you. I don't want to sidetrack this particular JIRA 
towards implementation details, so I'll leave it at that. Without regard to the 
specifics of the other JIRA, it seems likely to me that the "as up to date as 
possible" can often be implemented _more_ efficiently than the "snapshot 
iterator". The current implementation may not be up to snuff, so I'll leave it 
at this: I think the scanner semantics should be as loose as possible to 
achieve the maximum speed, and I view "up to date" as _looser_ than snapshot.

bq. I would think that clients which do 'lengthy scans' don't particularly care 
about performance 

I disagree - MR jobs are a typical "lengthy scan" application and throughput is 
certainly important. Especially important is the ability to have the bulk (MR) 
jobs coexist with high concurrent live load on the table.

> Enumerate ACID properties of HBase in a well defined spec
> ---------------------------------------------------------
>
>                 Key: HBASE-2294
>                 URL: https://issues.apache.org/jira/browse/HBASE-2294
>             Project: Hadoop HBase
>          Issue Type: Task
>          Components: documentation
>            Reporter: Todd Lipcon
>            Priority: Blocker
>             Fix For: 0.20.4, 0.21.0
>
>
> It's not written down anywhere what the guarantees are for each operation in 
> HBase with regard to the various ACID properties. I think the developers know 
> the answers to these questions, but we need a clear spec for people building 
> systems on top of HBase. Here are a few sample questions we should endeavor 
> to answer:
> - For a multicell put within a CF, is the update made durable atomically?
> - For a put across CFs, is the update made durable atomically?
> - Can a read see a row that hasn't been sync()ed to the HLog?
> - What isolation do scanners have? Somewhere between snapshot isolation and 
> no isolation?
> - After a client receives a "success" for a write operation, is that 
> operation guaranteed to be visible to all other clients?
> etc
> I see this JIRA as having several points of discussion:
> - Evaluation of what the current state of affairs is
> - Evaluate whether we currently provide any guarantees that aren't useful to 
> users of the system (perhaps we can drop in exchange for performance)
> - Evaluate whether we are missing any guarantees that would be useful to 
> users of the system

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to