[ 
https://issues.apache.org/jira/browse/HBASE-61?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-61:
-----------------------

    Attachment: hfile.patch

Testing, TFile is a good bit slower than MapFile if cells are ~100bytes or less 
and you are doing a random-access. Its slower even if you subsequently read 30 
rows at the offset -- even if we use a tfile block size of 8k.  If cell values 
are 1k, tfile is faster than MF.

So, after profiling and discussion on IRC, thought is that we need something 
like a stripped down tfile or even a new format altogether.  The attached patch 
is start of my stripping chunking and key and value streams out of TFile.  Not 
finished yet.  Intent is to keep most of the TFile API and the underlying block 
mechanism with its attendant block finding mechanism as well as all the 
metadata facility and index-on-the end but in the guts of tfile, there'd be the 
DFSClient FSInput/OutputStream and blocks of byte arrays only.  The stripped 
down TFile is now called HFile.

> [hbase] Create an HBase-specific MapFile implementation
> -------------------------------------------------------
>
>                 Key: HBASE-61
>                 URL: https://issues.apache.org/jira/browse/HBASE-61
>             Project: Hadoop HBase
>          Issue Type: Improvement
>          Components: io
>            Reporter: Bryan Duxbury
>            Assignee: stack
>            Priority: Minor
>             Fix For: 0.20.0
>
>         Attachments: cpucalltreetfile.html, hfile.patch, longestkey.patch, 
> tfile.patch, tfile3.patch
>
>
> Today, HBase uses the Hadoop MapFile class to store data persistently to 
> disk. This is convenient, as it's already done (and maintained by other 
> people :). However, it's beginning to look like there might be possible 
> performance benefits to be had from doing an HBase-specific implementation of 
> MapFile that incorporated some precise features.
> This issue should serve as a place to track discussion about what features 
> might be included in such an implementation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to