I noticed that HBase has the "HFleInputFormat" now, which can directly read the HFile to KV for map-reduce job:
https://github.com/apache/hbase/blob/master/hbase- mapreduce/src/main/java/org/apache/hadoop/hbase/mapreduce/ HFileInputFormat.java The "MapReduceHFileSplitterJob" is a sample job with this input format. With this feature, it is possible to merge the segments directly over HFile instead of from Kylin's cuboid files, and without going through HBase server. The cuboid files can be removed after a build, that can reduce lots of storage space. Does anyone want to investigate this? We welcome community contributions. -- Best regards, Shaofeng Shi 史少锋
