You can refer to these test cases to figure out the intermediate output of each 
steps
https://github.com/KylinOLAP/Kylin/blob/master/job/src/test/java/org/apache/kylin/job/hadoop/cube/BaseCuboidMapperTest.javahttps://github.com/KylinOLAP/Kylin/blob/master/job/src/test/java/org/apache/kylin/job/hadoop/cube/NDCuboidMapperTest.javahttps://github.com/KylinOLAP/Kylin/blob/master/job/src/test/java/org/apache/kylin/job/hadoop/cube/CubeReducerTest.java
 

> Date: Wed, 4 Mar 2015 00:14:10 +0530
> Subject: Re: Kylin code base help needed
> From: [email protected]
> CC: [email protected]
> 
> Need to figure out the output of every step in order to better understand
> the cube building process. Any way to decode the hadoop mapreduce output
> files?
> 
> On Tue, Mar 3, 2015 at 2:41 PM, Luke Han <[email protected]> wrote:
> 
> >     Kylin using dictionary to encode dimension values from String/Date to
> > digital value only, which will reduce storage significantly.
> >     In query phase, when Kylin got result, it will decode and return
> > actually value to the client.
> >
> >     Yang could have more detail comments for this.
> >
> >     BTW, the intermedia files only be used by Kylin application, why you
> > need to decode it?
> >     Please feel free to let's know if you have more questions.
> >
> >     Thanks.
> > Luke
> >
> >
> >
> >
> > 2015-03-03 17:01 GMT+08:00 Luke Han <[email protected]>:
> >
> >> Forward to mailing list for further support.
> >>
> >>
> >> ---------- Forwarded message ----------
> >> From: Abhishek Sinha <[email protected]>
> >> Date: 2015-02-22 20:20 GMT+08:00
> >> Subject: Kylin code base help needed
> >> To: [email protected]
> >>
> >>
> >> Hey,
> >> I was looking at the Kylin code base(master) in order to understand the
> >> flow and output of each of the steps in cube building process.
> >>
> >> The first step which is "Create Intermediate hive table" can easily be
> >> understood as the table is being created in Hive. However, further down the
> >> line, "Build base cuboid" or the "N dimension cuboid" has its output being
> >> created in a "tmp" folder in HDFS. I tried opening the 'part-r-00000' but
> >> it seems that the output is encoded in some format(possibly byte array or
> >> something).
> >>
> >> Can you give me a little bit idea about the encoding technique that is
> >> being used, and possibly how to decode and get the intermediate outputs.
> >>
> >>
> >>
> >>
> >> Thanks and regards,
> >>
> >>
> >>
> >> Abhishek Sinha
> >>
> >>
> >
                                          

Reply via email to