[ https://issues.apache.org/jira/browse/HBASE-3857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13032656#comment-13032656 ]
Mikhail Bautin commented on HBASE-3857: --------------------------------------- I will try to answer the rest of the questions: > + You say > "Block!type,!a!sequence!of!bytes!equivalent!to!version!1's!"magic!records" Is > this the case? The magic was supposed to be a sequence you could search to > pick up the parse again after hitting a bad patch of corrupted data. You seem > to instead start blocks with a type? In our design, the magic record a serialized representation of the block type. I did not see any logic that searches for a magic record after hitting a block of bad data in version 1, so I did not implement it in version 2. I am not sure what are the specific data corruption cases this might help fix. > + How are blocks sized now? Are we still cutting blocks off at first KV boundary after we go past configured hfile block size – e.g. 64k – or instead, is the block cutoff instead determined by fill of the bloom filter array or the index? The blocks are sized the same way as before. Block cutoff happens independently for regular data blocks and for inline blocks (Bloom blocks and leaf data index blocks). When a normal data block fills up, we give all registered "inline block writers" a chance to insert their next block into the stream. The Bloom filter writer has an ability to queue filled-up blocks until its next chance to write them, and block index writer's chunks can only fill up on data block boundary. > + I think I know what the following refers to in the diagram, "Version!2!root index,!stored!in!the!data!block!index!section!of!the!file" – its kept in the 'load-on-open section', right? This should have been "Version 2 root index, stored in the load-on-open section of the file". Thanks for catching this. I will fix this in the spec. > + • Offset!(long)! > o For this description "This!offset!may!point!to!a!data!block!or!to!a!deeper?level!index!block.! > • On?disk!size!(int)! > • Key!(a!serialized!byte!array)! > o Key!(VInt)! > o Key!bytes" > You are using vint specifying key size. We didn't do that in v1? You have a > good implementation (was costly IIRC using hadoops'). Actually, version 1 already uses VInt to store the block index, because it uses Bytes.writeByteArray, which stores the length as a VInt. We decided to keep the root-level block index format similar to the version 1 block index format, since it gets de-serialized into a byte[][], a long[], and an int[] anyway. > + Is a '!root!index!bloc' same as a 'Root Data Index' (from the diagram?) The Root Data Index is one particular instance of a root index block. We use the same "root index block" format for the data index root level, meta index (which is always single-level), and Bloom index (also single-level). For intermediate and leaf-level blocks we use another "non-root index block" format that allows to do binary search of the serialized data structure. > + "• entryOffsets:!the!“secondary!index” of!offsets!of!entries!in!the!block,!to! facilitate!a!quick!binary!search!on!the!key!(numEntries-int!values)" > Is this worth the bother? A binary search of in-memory data structure? How > many entries are you thinking there will be in these blocks? After discussing this with Nicolas, we decided not to change the data block format, because in our case there are somewhere between 10-500 key/value pairs per data block, so binary search does not offer much benefit compared to the current linear search, and the read time is dominated by input/output anyway. Hope this helps. Please let me know if you have any further questions/concerns about the HFile format v2. Thanks! --Mikhail > Change the HFile Format > ----------------------- > > Key: HBASE-3857 > URL: https://issues.apache.org/jira/browse/HBASE-3857 > Project: HBase > Issue Type: New Feature > Reporter: Liyin Tang > Assignee: Mikhail Bautin > Attachments: hfile_format_v2_design_draft_0.1.pdf > > > In order to support HBASE-3763 and HBASE-3856, we need to change the format > of the HFile. The new format proposal is attached here. Thanks for Mikhail > Bautin for the documentation. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira