[ 
https://issues.apache.org/jira/browse/HBASE-11425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14355297#comment-14355297
 ] 

stack commented on HBASE-11425:
-------------------------------

Thanks for the writeup. Makes it easier discussing this new dev.

"Typical used value for max heap size is 32-48 GB."

This ain't right, is it? Usually we have folks hover just below 32G so can do 
compressed pointers.

"Each bucket’s size is fixed to 4KB."

Should bucket size be same as the hfile block size?

Can MBB be developed in isolation with tests and refcounting tests apart from 
main code base? Is that being done?

High-level, general question: So eviction was easy before. When memory pressure 
just evict until needed memory is made available. The eviction is now made more 
complicated because have to check for non-zero refcount? And what if can't find 
necessary memory? What happens?

"Note that the LRU Cache does not have this block reference counting happening 
as that does not deal with BBs and deals with the HFileblock objects directly."

Why not? We copy from the LRU blocks to Cell arrays? Couldn't Cells go against 
the LRU blocks directly too? Or I have it wrong?

I don't see a downside listing that we'll be doubling the objects made when 
offheap reading. Is that right?

"Please note that the Cells in the memstore are still KV based (byte [] 
backed)" ... this is because you are only doing read-path in this JIRA, right? 
Then again, reading, we have to read from the MemStore so this means that read 
path can be a mix of onheap and offheap results?

On adding new methods to Cell, are there 'holes'? We talked about this in the 
past and it seemed like there could be strange areas in the Cell API if you did 
certain calls. If you don't know what I am on about, I'll dig up the old 
discussion (I think it was on mailing list... Ram you asked for input).

... or maybe the holes have been plugged by 'Using getXXXArray() would throw 
UnSupportedOperationException. '?  And....
"This will make so many short living objects creation also. That is why we 
decided to go with usage of getXXXOffset() and getXXXLength() API usage also 
along with buffer based APIs"

So, you might want to underline this point. Its BB but WE are managing the 
position and length to save on object creation and to bypass BB range checking, 
etc.

What does that mean for the 'client'?  When you give out a BB, its position, 
etc., is not to be relied upon.  That will be disorientating.  Pity you 
couldn't throw unsupportedexception if they tried use position, etc. So you 
need BB AND the Cell to get at content. BB for the array and then Cell for the 
offset and length...

So, this API is for users on client-side? It is going to confuse them when they 
have a BB but the position and limit are duds. In client, when would they be 
doing BB? Never? Client won't be offheaping? If so, could the BB APIs be mixed 
in to Cell on the server only?

So, why have the switch at all? The hasArray switch? Why not BB it all the 
time? It would simplify the read path.  Disadvantage would be it'd be extra 
objects?

When you say this: "Note that even if the HFileBlock is on heap BB we do not 
support getXXXArray() APIs. " This is only if hasArray returns false, right?

Yeah, looks like 2.0.

Tell us more about the unsafe manipulation of BBs? How's that work?

Nice writeup.

> Cell/DBB end-to-end on the read-path
> ------------------------------------
>
>                 Key: HBASE-11425
>                 URL: https://issues.apache.org/jira/browse/HBASE-11425
>             Project: HBase
>          Issue Type: Umbrella
>          Components: regionserver, Scanners
>    Affects Versions: 0.99.0
>            Reporter: Anoop Sam John
>            Assignee: Anoop Sam John
>         Attachments: Offheap reads in HBase using BBs_final.pdf
>
>
> Umbrella jira to make sure we can have blocks cached in offheap backed cache. 
> In the entire read path, we can refer to this offheap buffer and avoid onheap 
> copying.
> The high level items I can identify as of now are
> 1. Avoid the array() call on BB in read path.. (This is there in many 
> classes. We can handle class by class)
> 2. Support Buffer based getter APIs in cell.  In read path we will create a 
> new Cell with backed by BB. Will need in CellComparator, Filter (like SCVF), 
> CPs etc.
> 3. Avoid KeyValue.ensureKeyValue() calls in read path - This make byte copy.
> 4. Remove all CP hooks (which are already deprecated) which deal with KVs.  
> (In read path)
> Will add subtasks under this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to