From the log we see the data that Kylin read is not standard date, I doubt there is dirty data: 8549-07-10
8621-07-06 9994-04-05 … What’s the file format of your source hive table? If you run “select * from kylin_intermediate_*”, are the columns be separated clearly? On 3/19/15, 4:44 PM, "dong wang" <[email protected]> wrote: >Hi shaofeng, the following log should be what you mentioned: > >[pool-7-thread-1]:[2015-03-19 >16:42:02,980][INFO][org.apache.kylin.dict.DictionaryGenerator.buildDiction >aryFromValueList(DictionaryGenerator.java:75)] >- Dictionary value samples: 8549-07-10 >=>3122651, 8621-07-06=>3148944, 9994-04-05=>3650330, 9808-04-14=>3582404, >5012-02-14=>1830641 >[pool-7-thread-1]:[2015-03-19 >16:42:02,980][INFO][org.apache.kylin.dict.DictionaryGenerator.buildDiction >aryFromValueList(DictionaryGenerator.java:76)] >- Dictionary cardinality 7304854 > >2015-03-19 13:20 GMT+08:00 Shi, Shaofeng <[email protected]>: > >> Just before this exception, there should be some log saying "Dictionary >> value samples: “; Could you please find and paste that log msg? >> >> On 3/19/15, 1:12 PM, "dong wang" <[email protected]> wrote: >> >> >for DEFAULT.TEST.MYDATE, select count(distinct(mydate)) from test, >> >returns 533, which represents all the days till now >> >for kylin_intermediate_*.MYDATE, select count(distinct())) from >> >kylin_intermediate_*, returns 517, which indicates the first segment of >> >the >> >cube, and the segment contains 517 days' data >> > >> >2015-03-19 1:20 GMT+08:00 hongbin ma <[email protected]>: >> > >> >> can you use hive to get the distinct count values in >>DEFAULT.TEST.MYDATE >> >> and kylin_intermediate_*.MYDATE? >> >> >> >> On Wed, Mar 18, 2015 at 6:05 AM, dong wang <[email protected]> >> >>wrote: >> >> >> >> > 1, in Hive, my_date is indeeded DATE type, it means which day the >> >>records >> >> > belong to >> >> > 2, it is certain that the pattern for this column "mydate" is >> >> "yyyy-MM-dd", >> >> > no "HH", "MM", "SS" at all for my_date >> >> > 3, for the kylin_intermediate_* table, I'm sure the data for the >> >>column >> >> is >> >> > like "2015-03-17" >> >> > >> >> >> >> >> >> >> >> -- >> >> Regards, >> >> >> >> *Bin Mahone | 马洪宾* >> >> Apache Kylin: http://kylin.io >> >> Github: https://github.com/binmahone >> >> >> >>
