From the log we see the data that Kylin read is not standard date, I doubt
there is dirty data:
8549-07-10

8621-07-06

9994-04-05

…

What’s the file format of your source hive table? If you run “select *
from  kylin_intermediate_*”, are the columns be separated clearly?


On 3/19/15, 4:44 PM, "dong wang" <[email protected]> wrote:

>Hi shaofeng, the following log should be what you mentioned:
>
>[pool-7-thread-1]:[2015-03-19
>16:42:02,980][INFO][org.apache.kylin.dict.DictionaryGenerator.buildDiction
>aryFromValueList(DictionaryGenerator.java:75)]
>- Dictionary value samples: 8549-07-10
>=>3122651, 8621-07-06=>3148944, 9994-04-05=>3650330, 9808-04-14=>3582404,
>5012-02-14=>1830641
>[pool-7-thread-1]:[2015-03-19
>16:42:02,980][INFO][org.apache.kylin.dict.DictionaryGenerator.buildDiction
>aryFromValueList(DictionaryGenerator.java:76)]
>- Dictionary cardinality 7304854
>
>2015-03-19 13:20 GMT+08:00 Shi, Shaofeng <[email protected]>:
>
>> Just before this exception, there should be some log saying "Dictionary
>> value samples: “; Could you please find and paste that log msg?
>>
>> On 3/19/15, 1:12 PM, "dong wang" <[email protected]> wrote:
>>
>> >for  DEFAULT.TEST.MYDATE,  select count(distinct(mydate)) from test,
>> >returns 533,  which represents all the days till now
>> >for kylin_intermediate_*.MYDATE,   select count(distinct())) from
>> >kylin_intermediate_*, returns 517, which indicates the first segment of
>> >the
>> >cube, and the segment contains 517 days' data
>> >
>> >2015-03-19 1:20 GMT+08:00 hongbin ma <[email protected]>:
>> >
>> >> can you use hive to get the distinct count values in
>>DEFAULT.TEST.MYDATE
>> >> and kylin_intermediate_*.MYDATE?
>> >>
>> >> On Wed, Mar 18, 2015 at 6:05 AM, dong wang <[email protected]>
>> >>wrote:
>> >>
>> >> > 1, in Hive, my_date is indeeded DATE type, it means which day the
>> >>records
>> >> > belong to
>> >> > 2, it is certain that the pattern for this column "mydate" is
>> >> "yyyy-MM-dd",
>> >> >  no "HH", "MM", "SS" at all for my_date
>> >> > 3, for the kylin_intermediate_* table, I'm sure the data for the
>> >>column
>> >> is
>> >> > like "2015-03-17"
>> >> >
>> >>
>> >>
>> >>
>> >> --
>> >> Regards,
>> >>
>> >> *Bin Mahone | 马洪宾*
>> >> Apache Kylin: http://kylin.io
>> >> Github: https://github.com/binmahone
>> >>
>>
>>

Reply via email to