[ 
https://issues.apache.org/jira/browse/HBASE-17185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15706992#comment-15706992
 ] 

stack commented on HBASE-17185:
-------------------------------

Did some study. The read-of-the-next-blocks header is used only in the rare 
case where we are loading metadata on file open. Metadata includes hfile 
indices themselves stored as blocks. In this opening case, we do not have an 
hfile index to get block lengths from (we do not have an index for the indices 
-- TODO). There are three or so metadata blocks in the normal case. We could 
double the seeks done for the file open case doing a seek for the header to get 
lengths and then body for each metablock and I think it'd be fine given these 
are not real 'seeks' but just read-forwards in an already loaded hdfs stream 
but I did not do the work to prove this assertion. Needs a bit of work 
comparing before and after.

I looked at undoing the read-ahead into the next block for all but the startup 
case and it'd involve code duplication and would undo much of the 
simplification/benefit the attached patch brings.

Putting aside for now until time to do the perf/resource compare (though in a 
subtask, have updated HFileBlock doc w/o changing functionality to inculcate 
findings of my study).

> Purge the seek of the next block reading HFileBlocks
> ----------------------------------------------------
>
>                 Key: HBASE-17185
>                 URL: https://issues.apache.org/jira/browse/HBASE-17185
>             Project: HBase
>          Issue Type: Improvement
>          Components: HFile
>    Affects Versions: 2.0.0
>            Reporter: stack
>            Assignee: stack
>            Priority: Minor
>              Labels: beginner
>             Fix For: 2.0.0
>
>         Attachments: HBASE-17185.master.001.patch, HBASE-17185.patch
>
>
> When we read HFileBlocks, we read the asked-for block AND the next block's 
> header which we add to a cache (see HBASE-17072). We do this extra read to 
> get the next block's length purportedly. This seek of the next block's header 
> complicates the HFileBlock construction (not to mind other consequences -- 
> again see HBASE-17072).
> Study done in HBASE-17072 shows that we normally do not need this extra read 
> of the next block's header. In the usual case, the length of the block is 
> gotten from the hfile index.
> A simplification of block reading can be done purging this extra header read. 
> We can also save some space in cache.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to