Anoop Sam John created HBASE-6040: ------------------------------------- Summary: Use block encoding and HBase handled checksum verification in bulk loading using HFileOutputFormat Key: HBASE-6040 URL: https://issues.apache.org/jira/browse/HBASE-6040 Project: HBase Issue Type: Improvement Components: mapreduce Reporter: Anoop Sam John Assignee: Anoop Sam John
When the data is bulk loaded using HFileOutputFormat, we are not using the block encoding and the HBase handled checksum features.. When the writer is created for making the HFile, I am not seeing any such info passing to the WriterBuilder. In HFileOutputFormat.getNewWriter(byte[] family, Configuration conf), we dont have these info and do not pass also to the writer... So those HFiles will not have these optimizations.. Later in LoadIncrementalHFiles.copyHFileHalf(), where we physically divide one HFile(created by the MR) iff it can not belong to just one region, I can see we pass the datablock encoding details and checksum details to the new HFile writer. But this step wont happen normally I think.. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira