[ 
https://issues.apache.org/jira/browse/HBASE-3857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13032656#comment-13032656
 ] 

Mikhail Bautin commented on HBASE-3857:
---------------------------------------

I will try to answer the rest of the questions:

> + You say 
> "Block!type,!a!sequence!of!bytes!equivalent!to!version!1's!"magic!records" Is 
> this the case? The magic was supposed to be a sequence you could search to 
> pick up the parse again after hitting a bad patch of corrupted data. You seem 
> to instead start blocks with a type?

    In our design, the magic record a serialized representation of the block 
type. 
    I did not see any logic that searches for a magic record after hitting a 
block 
    of bad data in version 1, so I did not implement it in version 2. I am not 
sure
    what are the specific data corruption cases this might help fix.


> + How are blocks sized now? Are we still cutting blocks off at first KV 
boundary after we go past configured hfile block size – e.g. 64k – or instead, 
is the block cutoff instead determined by fill of the bloom filter array or the 
index?

    The blocks are sized the same way as before. Block cutoff happens 
independently
    for regular data blocks and for inline blocks (Bloom blocks and leaf data 
index
    blocks). When a normal data block fills up, we give all registered "inline
    block writers" a chance to insert their next block into the stream. The 
Bloom
    filter writer has an ability to queue filled-up blocks until its next 
chance to
    write them, and block index writer's chunks can only fill up on data block
    boundary.


> + I think I know what the following refers to in the diagram, 
"Version!2!root index,!stored!in!the!data!block!index!section!of!the!file" – 
its kept in the 'load-on-open section', right?
   
    This should have been "Version 2 root index, stored in the load-on-open 
section
    of the file". Thanks for catching this. I will fix this in the spec.


> + • Offset!(long)!
> o For this description 
"This!offset!may!point!to!a!data!block!or!to!a!deeper?level!index!block.!
> • 
On?disk!size!(int)!
> • Key!(a!serialized!byte!array)!
> o Key!(VInt)!
> o 
Key!bytes"
> You are using vint specifying key size. We didn't do that in v1? You have a 
> good implementation (was costly IIRC using hadoops').

    Actually, version 1 already uses VInt to store the block index, because it 
uses
    Bytes.writeByteArray, which stores the length as a VInt. We decided to keep 
the
    root-level block index format similar to the version 1 block index format, 
since
    it gets de-serialized into a byte[][], a long[], and an int[] anyway.

> + Is a '!root!index!bloc' same as a 'Root Data Index' (from the diagram?)
   
    The Root Data Index is one particular instance of a root index block. We 
use the
    same "root index block" format for the data index root level, meta index
    (which is always single-level), and Bloom index (also single-level). For
    intermediate and leaf-level blocks we use another "non-root index block" 
format 
    that allows to do binary search of the serialized data structure.


> + "• entryOffsets:!the!“secondary!index” 
of!offsets!of!entries!in!the!block,!to!

facilitate!a!quick!binary!search!on!the!key!(numEntries-int!values)"
> Is this worth the bother? A binary search of in-memory data structure? How 
> many entries are you thinking there will be in these blocks?

    After discussing this with Nicolas, we decided not to change the data block
    format, because in our case there are somewhere between 10-500 key/value 
pairs
    per data block, so binary search does not offer much benefit compared to the
    current linear search, and the read time is dominated by input/output 
anyway. 

Hope this helps. Please let me know if you have any further questions/concerns 
about
the HFile format v2.

Thanks!
--Mikhail


> Change the HFile Format
> -----------------------
>
>                 Key: HBASE-3857
>                 URL: https://issues.apache.org/jira/browse/HBASE-3857
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Liyin Tang
>            Assignee: Mikhail Bautin
>         Attachments: hfile_format_v2_design_draft_0.1.pdf
>
>
> In order to support HBASE-3763 and HBASE-3856, we need to change the format 
> of the HFile. The new format proposal is attached here. Thanks for Mikhail 
> Bautin for the documentation. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to