[ 
https://issues.apache.org/jira/browse/HBASE-6040?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anoop Sam John updated HBASE-6040:
----------------------------------

    Release Note: 
Added a new config param "hbase.mapreduce.hfileoutputformat.datablock.encoding" 
using which we can specify which encoding scheme to be used on disk. Data will 
get written in to HFiles using this encoding scheme while bulk load. The value 
of this can be NONE, PREFIX, DIFF, FAST_DIFF as these are the DataBlockEncoding 
types supported now. [When any new types are added later, corresponding names 
also will become valid]
The checksum type and number of bytes per checksum can be configured using the 
config params hbase.hstore.checksum.algorithm, hbase.hstore.bytes.per.checksum 
respectively

    
> Use block encoding and HBase handled checksum verification in bulk loading 
> using HFileOutputFormat
> --------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-6040
>                 URL: https://issues.apache.org/jira/browse/HBASE-6040
>             Project: HBase
>          Issue Type: Improvement
>          Components: mapreduce
>    Affects Versions: 0.94.0, 0.96.0
>            Reporter: Anoop Sam John
>            Assignee: Anoop Sam John
>             Fix For: 0.94.1
>
>         Attachments: HBASE-6040_94.patch, HBASE-6040_Trunk.patch
>
>
> When the data is bulk loaded using HFileOutputFormat, we are not using the 
> block encoding and the HBase handled checksum features..  When the writer is 
> created for making the HFile, I am not seeing any such info passing to the 
> WriterBuilder.
> In HFileOutputFormat.getNewWriter(byte[] family, Configuration conf), we dont 
> have these info and do not pass also to the writer... So those HFiles will 
> not have these optimizations..
> Later in LoadIncrementalHFiles.copyHFileHalf(), where we physically divide 
> one HFile(created by the MR) iff it can not belong to just one region, I can 
> see we pass the datablock encoding details and checksum details to the new 
> HFile writer. But this step wont happen normally I think..

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to