Thanks Stack for your reply. I will work on this and give a patch soon... -Anoop- ________________________________________ From: [email protected] [[email protected]] on behalf of Stack [[email protected]] Sent: Saturday, May 12, 2012 10:08 AM To: [email protected] Subject: Re: Usage of block encoding in bulk loading
On Fri, May 11, 2012 at 10:18 AM, Anoop Sam John <[email protected]> wrote: > Hi Devs > When the data is bulk loaded using HFileOutputFormat, we are not > using the block encoding and the HBase handled checksum features I think.. > When the writer is created for making the HFile, I am not seeing any such > info passing to the WriterBuilder. > In HFileOutputFormat.getNewWriter(byte[] family, Configuration conf), we dont > have these info and do not pass also to the writer... So those HFiles will > not have these optimizations.. > > Later in LoadIncrementalHFiles.copyHFileHalf(), where we physically divide > one HFile(created by the MR) iff it can not belong to just one region, I can > see we pass the datablock encoding details and checksum details to the new > HFile writer. But this step wont happen normally I think.. > > Correct me if my understanding is wrong pls... > Sounds plausible Anoop. Sounds like something worth fixing too? Good on you, St.Ack
