what's the cardinality of the dimension that you want to count distinct
values? Integer's range is enough for most cases, if your case is under
this scope, you can try the bitmap with integer; but you need map the value
to an unique id and use that within the bitmap. For example, if you want to
count distinct users, use the numeric user_id, instead of email address; To
support other data types, as Hongbin mentioned, the storage cost is very
high, we don't have that plan.





2016-01-28 20:54 GMT+08:00 hongbin ma <mahong...@apache.org>:

> KYLIN-1186 <https://issues.apache.org/jira/browse/KYLIN-1186> is not a
> mature feature yet and it only supports integer
> we don't yet have plans to support any other forms of precise distinct
> count, as it is too expensive to pre-calculate
>
> On Thu, Jan 28, 2016 at 6:56 PM, Abhilash L L <abhil...@infoworks.io>
> wrote:
>
> > Thanks ShaoFeng Shi,
> >
> > We might need for other data types as well
> >
> > date & string
> >
> >  (eg, distinct count of dates of certain activity)
> >
> > So in the rest call instead of hllc return type it should be bitmap for
> > int,tinyint etc ?
> >
> > And we still send it as hllc for other data types ?
> >
> >
> > Also in one of the comments, it said we cast long to int..  wont we be
> > losing data due to truncation ?
> >
> >
> > Regards,
> > Abhilash
> >
> > On Thu, Jan 28, 2016 at 3:43 PM, ShaoFeng Shi <shaofeng...@apache.org>
> > wrote:
> >
> > > is this matched your case?
> > > https://issues.apache.org/jira/browse/KYLIN-1186
> > >
> > > 2016-01-28 17:42 GMT+08:00 Abhilash L L <abhil...@infoworks.io>:
> > >
> > > > +user ml
> > > >
> > > > Regards,
> > > > Abhilash
> > > >
> > > > On Thu, Jan 28, 2016 at 11:32 AM, Abhilash L L <
> abhil...@infoworks.io>
> > > > wrote:
> > > >
> > > > > Hello,
> > > > >
> > > > >    Is there a way to ask Kylin to get exact distinct count ?  From
> > what
> > > > we
> > > > > understand, we can choose between hllc(10) to hllc(16)
> > > > >
> > > > >    I understand that for every cuboid, you will need to go through
> > the
> > > > > whole data set again, but with the new cubing algo (2.x branch)
> > should
> > > be
> > > > > simpler to add ?
> > > > >
> > > > >    If currently not present are there any plans to introduce this ?
> > > > >
> > > > > Regards,
> > > > > Abhilash
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > Best regards,
> > >
> > > Shaofeng Shi
> > >
> >
>
>
>
> --
> Regards,
>
> *Bin Mahone | 马洪宾*
> Apache Kylin: http://kylin.io
> Github: https://github.com/binmahone
>



-- 
Best regards,

Shaofeng Shi

Reply via email to