@Sun Yerui, I looked the query parsing part again, it's possible to delay
the aggregation mapping to be after cube selection. And then the type info
on the cube can supplement the mapping. It requires some refactoring
effort, but won't affect the MeasureType interface. You can proceed
implementation at your side while I work on this change.

On Fri, Dec 11, 2015 at 11:36 AM, Li Yang <liy...@apache.org> wrote:

> I can see the need from user perspective. Let me look again at the query
> parsing logic and see if any tweak is possible.
>
> On Fri, Dec 11, 2015 at 7:59 AM, Luke Han <luke...@gmail.com> wrote:
>
>> It should transparent to users, they should always use "count(distinct
>> seller_id)"
>>
>> How about one setting value when user pickup "DistinctCount"? We already
>> have error range, it should be easy to have one more option say "Precise"
>> (but yes, also have to display warn message about the disadvantage for
>> this). Then in code level, it could be easy to handle like Yerui
>> mentioned.
>>
>> Thanks.
>>
>>
>>
>>
>> Best Regards!
>> ---------------------
>>
>> Luke Han
>>
>> On Thu, Dec 10, 2015 at 7:33 PM, Yerui Sun <sunye...@gmail.com> wrote:
>>
>> > You’re right, I ignored that can’t get return type from query context.
>> >
>> > I’m not familiar with Calcite UDF, do you mean a new sql writing like
>> > “count (distinct_precise seller_id)”? That’s not transparent for user,
>> > seems not the best way.
>> >
>> > Another way is still mapping count distinct query to one aggr func, and
>> > making the func could handle variety of ValueType. For example,
>> abstracting
>> > a count distinct measure type called ‘CountDistinctMeasureType’, as
>> parent
>> > of HLLCMeasureType and BitmapMeasureType, and mapping all count distinct
>> > query to ‘CountDistinctAggFunc’, with abstract class
>> ‘CountDistinctCounter’
>> > as add() and merge() parameter type. When this aggr func was called, the
>> > processing depends on the value type, like HLLCounter or BitmapCounter.
>> >
>> > I’not sure whether I’ve described it clear. Actually I have implemented
>> > bitmap count distinct in 1.x-staging by this way, keeping hll count
>> > distinct still working. Maybe I could implement it in 2.x-staging with
>> your
>> > refactoring, and we could review the code later?
>> >
>> > > 在 2015年12月10日,18:23,Li Yang <liy...@apache.org> 写道:
>> > >
>> > > I've considered exactly the same point. It does not work when mapping
>> a
>> > > query to the aggregation functions. A query will simply say "count
>> > > (distinct seller_id)", and won't mention any return type.
>> > >
>> > > The way out is adding a new aggregation for your count distinct using
>> > > Calcite UDF, then it can be correctly mapped. I don't have an example
>> > yet,
>> > > so we need do some exploration here. Actually I hope to use your case
>> as
>> > an
>> > > example.  :-)
>> > >
>> > >
>> > >
>> > > On Thu, Dec 10, 2015 at 4:25 PM, Yerui Sun <sunye...@gmail.com>
>> wrote:
>> > >
>> > >> It’s really great job, Yang!
>> > >>
>> > >> I have a question about the MeasureTypeFactory. In the current
>> > 2.x-stating
>> > >> code, two built-in measure types (hll and topn) were registered, and
>> the
>> > >> factory create the corresponding MeasureType only by funcName
>> > >> (‘COUNT_DISTINCT’ for hll and ‘TOP_N’ for topn).
>> > >> However, if I want to create a new measure type with same funcName,
>> > that’s
>> > >> impossible. For example, I want to create bitmap measure by funcName
>> > >> ‘COUNT_DISTINCT’, same as hll measure's funcName.
>> > >>
>> > >> One possible way is that factory create measure type not only rely on
>> > >> funcName, but also returnType, making one funcName to multi measure
>> is
>> > >> possible.
>> > >> In another word, we could define the measure type in factory using
>> > >> funcName and returnType, instead of only funcName for now.
>> > >>
>> > >> Do you think this make sense? Looking for your comment.
>> > >>
>> > >>> 在 2015年12月10日,14:57,Li Yang <liy...@apache.org> 写道:
>> > >>>
>> > >>>> Would it be possible to create a How to guide on ability to add
>> custom
>> > >> aggregates
>> > >>> into Kylin
>> > >>>
>> > >>> Definitely! I should spent some time on documentation in the
>> following
>> > >>> days. Many features have been added to 2.x. Aiming to release a 2.0
>> > beta
>> > >>> soon, it's time to work on document. :-)
>> > >>>
>> > >>>> Where are the custom aggregates computed on the Kylin Service or on
>> > >> Hbase
>> > >>> CoProcessors?
>> > >>>
>> > >>> The aggregation takes place in MR during cube build, then in
>> > CoProcessor
>> > >>> and query service during query. Originally I hoped user can add new
>> > >>> aggregation by just dropping a jar ball and some configuration.
>> However
>> > >> it
>> > >>> turns out to be more than that due to CoProcessor... Anyway, it's a
>> lot
>> > >>> more friendly to developers now.
>> > >>>
>> > >>> On Thu, Dec 10, 2015 at 2:14 PM, hongbin ma <mahong...@apache.org>
>> > >> wrote:
>> > >>>
>> > >>>> hi seshu
>> > >>>>
>> > >>>> yang's work is more of a framework. it reduces developers' efforts
>> if
>> > >>>> he/she wants to add a new custom aggregations. Since some of the
>> > >>>> aggregations happens in coprocessors, we cannot completely get rid
>> of
>> > >>>> re-compiling & re-deploying. If someone from the community is
>> > >> interested in
>> > >>>> crafting a new aggregation, he/she can take a look at how HLL/TOPN
>> > >>>> aggregation is implemented.
>> > >>>>
>> > >>>> On Wed, Dec 9, 2015 at 9:43 PM, Adunuthula, Seshu <
>> > sadunuth...@ebay.com
>> > >>>
>> > >>>> wrote:
>> > >>>>
>> > >>>>> Yang,
>> > >>>>>
>> > >>>>> Would it be possible to create a How to guide on ability to add
>> > custom
>> > >>>>> aggregates into Kylin. Javadocs are good, but to encourage
>> community
>> > >>>>> participation we should make it more easily consumable.
>> > >>>>>
>> > >>>>> Where are the custom aggregates computed on the Kylin Service or
>> on
>> > >> Hbase
>> > >>>>> CoProcessors?
>> > >>>>>
>> > >>>>> Regards
>> > >>>>> Seshu Adunuthula.
>> > >>>>>
>> > >>>>> On 12/8/15, 6:18 AM, "Adunuthula, Seshu" <sadunuth...@ebay.com>
>> > wrote:
>> > >>>>>
>> > >>>>>> This is awesome!
>> > >>>>>>
>> > >>>>>> On 12/8/15, 6:05 AM, "Shi, Shaofeng" <shao...@ebay.com> wrote:
>> > >>>>>>
>> > >>>>>>> This is another important refactor since making the build/query
>> > >> engines
>> > >>>>>>> as
>> > >>>>>>> plugable. Thanks Yang!
>> > >>>>>>>
>> > >>>>>>> On 12/8/15, 5:47 PM, "Li Yang" <liy...@apache.org> wrote:
>> > >>>>>>>
>> > >>>>>>>> This is a bump of KYLIN-976 in case you are not yet aware...
>> > >>>>>>>>
>> > >>>>>>>> KYLIN-976 is a refactoring of how Kylin works with aggregation
>> and
>> > >>>> aims
>> > >>>>>>>> to
>> > >>>>>>>> allow adding custom aggregation types easily.
>> > >>>>>>>>
>> > >>>>>>>> Kylin started with basic support of SUM, COUNT, MAX, MIN, AVG
>> > (from
>> > >>>> sum
>> > >>>>>>>> and
>> > >>>>>>>> count), and COUNT_DISTINCT (based on hyperloglog). Later TopN
>> is
>> > >> added
>> > >>>>>>>> in
>> > >>>>>>>> 2.x branch. And the list is growing for sure. Xiaoyu is
>> working on
>> > >>>>>>>> storing
>> > >>>>>>>> raw records as a special type of measure (KYLIN-1122), also
>> Yerui
>> > is
>> > >>>>>>>> working on precise count distinct using bitmap (KYLIN-1186).
>> > >>>>>>>>
>> > >>>>>>>> The possibility is unlimited. Implement a domain specific
>> > >> aggregation
>> > >>>> is
>> > >>>>>>>> now quite easy. E.g. aggregate user events to detect time
>> serials
>> > or
>> > >>>>>>>> access
>> > >>>>>>>> patterns. Or draw a sketch of certain user groups. Or
>> > pre-calculate
>> > >>>>>>>> clusters of data points. Or histogram... Use your imagination.
>> > >>>>>>>>
>> > >>>>>>>> Whoever interested can peek at MeasureTypeFactory and
>> MeasureType
>> > on
>> > >>>> 2.x
>> > >>>>>>>> branch. The API may still change, but at the same time is
>> stable
>> > >>>> enough
>> > >>>>>>>> for
>> > >>>>>>>> pilots. The javadoc should get you started. HLLCMeasureType and
>> > >>>>>>>> TopNMeasureType are two good examples.
>> > >>>>>>>>
>> > >>>>>>>>
>> > >>>>>>>> Cheers
>> > >>>>>>>> Yang
>> > >>>>>>>
>> > >>>>>>
>> > >>>>>
>> > >>>>>
>> > >>>>
>> > >>>>
>> > >>>> --
>> > >>>> Regards,
>> > >>>>
>> > >>>> *Bin Mahone | 马洪宾*
>> > >>>> Apache Kylin: http://kylin.io
>> > >>>> Github: https://github.com/binmahone
>> > >>>>
>> > >>
>> > >>
>> >
>> >
>>
>
>

Reply via email to