You can refer to these test cases to figure out the intermediate output of each steps https://github.com/KylinOLAP/Kylin/blob/master/job/src/test/java/org/apache/kylin/job/hadoop/cube/BaseCuboidMapperTest.javahttps://github.com/KylinOLAP/Kylin/blob/master/job/src/test/java/org/apache/kylin/job/hadoop/cube/NDCuboidMapperTest.javahttps://github.com/KylinOLAP/Kylin/blob/master/job/src/test/java/org/apache/kylin/job/hadoop/cube/CubeReducerTest.java
> Date: Wed, 4 Mar 2015 00:14:10 +0530 > Subject: Re: Kylin code base help needed > From: [email protected] > CC: [email protected] > > Need to figure out the output of every step in order to better understand > the cube building process. Any way to decode the hadoop mapreduce output > files? > > On Tue, Mar 3, 2015 at 2:41 PM, Luke Han <[email protected]> wrote: > > > Kylin using dictionary to encode dimension values from String/Date to > > digital value only, which will reduce storage significantly. > > In query phase, when Kylin got result, it will decode and return > > actually value to the client. > > > > Yang could have more detail comments for this. > > > > BTW, the intermedia files only be used by Kylin application, why you > > need to decode it? > > Please feel free to let's know if you have more questions. > > > > Thanks. > > Luke > > > > > > > > > > 2015-03-03 17:01 GMT+08:00 Luke Han <[email protected]>: > > > >> Forward to mailing list for further support. > >> > >> > >> ---------- Forwarded message ---------- > >> From: Abhishek Sinha <[email protected]> > >> Date: 2015-02-22 20:20 GMT+08:00 > >> Subject: Kylin code base help needed > >> To: [email protected] > >> > >> > >> Hey, > >> I was looking at the Kylin code base(master) in order to understand the > >> flow and output of each of the steps in cube building process. > >> > >> The first step which is "Create Intermediate hive table" can easily be > >> understood as the table is being created in Hive. However, further down the > >> line, "Build base cuboid" or the "N dimension cuboid" has its output being > >> created in a "tmp" folder in HDFS. I tried opening the 'part-r-00000' but > >> it seems that the output is encoded in some format(possibly byte array or > >> something). > >> > >> Can you give me a little bit idea about the encoding technique that is > >> being used, and possibly how to decode and get the intermediate outputs. > >> > >> > >> > >> > >> Thanks and regards, > >> > >> > >> > >> Abhishek Sinha > >> > >> > >
