I suspect the intermediate table was not read correctly.

Look at the 2nd build step, it extract distinct values of every column on
fact table that requires dictionary. Related code is
FactDistinctColumnsMapper, FactDistinctColumnsReducer. In the output dir,
find the text file (named after column name) containing all your date
strings. Confirm if the content is correct or not.

If bad date is there, then FactDistinctColumnsMapper had problem reading
the intermediate hive table. Further troubleshooting will be writing a unit
test to read table use hcatalog and debug through it.

Cheers
Yang

On Thu, Mar 19, 2015 at 9:57 PM, Shi, Shaofeng <[email protected]> wrote:

> Thanks; so far I¹m not sure whether it is an regression of KYLIN-630; Will
> discuss this with Yang tomorrow;
>
> On 3/19/15, 6:00 PM, "dong wang" <[email protected]> wrote:
>
> >all the data are dumped from MySQL, and load into HIVE successfully, thus,
> >the data format should be right~ and after applying the fix of issue-630,
> >I
> >cannot even build one day's data incrementally due to the error mentioned
> >above~
> >
> >2015-03-19 17:54 GMT+08:00 dong wang <[email protected]>:
> >
> >> 1, the data in kylin_intermediate_* are like:
> >> 2015-02-02      0       409     13619   432     1267    2       13
> >>34
> >>      59      0       39      0
> >> 2015-02-02      0       5534    13943   432     1259    1       17
> >>40
> >>      73      0       1       0
> >> 2015-02-02      0       845     14194   461     1245    1       17
> >>38
> >>      66      0       1       0
> >> 2015-02-02      0       409     13617   432     1227    2       13
> >>34
> >>      59      0       276     2
> >> 2015-02-02      0       19      11539   387     1084    2       15
> >>35
> >>      67      0       7       0
> >> 2015-02-02      0       1221    12985   387     1079    2       15
> >>39
> >>      62      0       12      0
> >> 2015-02-02      0       51      11152   387     1076    2       15
> >>35
> >>      67      0       770     0
> >> 2015-02-02      0       166     11148   387     1057    2       15
> >>35
> >>      67      0       282     0
> >> 2015-02-02      0       5157    11295   397     810     2       15
> >>35
> >>      67      0       5       0
> >> 2015-02-02      0       151     11659   397     807     1       17
> >>36
> >>      63      0       1       0
> >>
> >> 2, select * from test where mydate='8549-07-10' limit 10;  return empty.
> >>
> >> 3, another thing is that we haven't found incorrect data for the sum
> >> result for each day, it only affects segment merging and building a new
> >> segment
> >>
>
>

Reply via email to