[ https://issues.apache.org/jira/browse/HBASE-6040?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Anoop Sam John updated HBASE-6040: ---------------------------------- Release Note: Added a new config param "hbase.mapreduce.hfileoutputformat.datablock.encoding" using which we can specify which encoding scheme to be used on disk. Data will get written in to HFiles using this encoding scheme while bulk load. The value of this can be NONE, PREFIX, DIFF, FAST_DIFF as these are the DataBlockEncoding types supported now. [When any new types are added later, corresponding names also will become valid] The checksum type and number of bytes per checksum can be configured using the config params hbase.hstore.checksum.algorithm, hbase.hstore.bytes.per.checksum respectively > Use block encoding and HBase handled checksum verification in bulk loading > using HFileOutputFormat > -------------------------------------------------------------------------------------------------- > > Key: HBASE-6040 > URL: https://issues.apache.org/jira/browse/HBASE-6040 > Project: HBase > Issue Type: Improvement > Components: mapreduce > Affects Versions: 0.94.0, 0.96.0 > Reporter: Anoop Sam John > Assignee: Anoop Sam John > Fix For: 0.94.1 > > Attachments: HBASE-6040_94.patch, HBASE-6040_Trunk.patch > > > When the data is bulk loaded using HFileOutputFormat, we are not using the > block encoding and the HBase handled checksum features.. When the writer is > created for making the HFile, I am not seeing any such info passing to the > WriterBuilder. > In HFileOutputFormat.getNewWriter(byte[] family, Configuration conf), we dont > have these info and do not pass also to the writer... So those HFiles will > not have these optimizations.. > Later in LoadIncrementalHFiles.copyHFileHalf(), where we physically divide > one HFile(created by the MR) iff it can not belong to just one region, I can > see we pass the datablock encoding details and checksum details to the new > HFile writer. But this step wont happen normally I think.. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira