That’s great. I’ll working on bitmap counter implemention and cube buidling in 
this week, and merge your changing on query logic next week.

> 在 2015年12月17日,17:48,Li Yang <liy...@apache.org> 写道:
> 
> @Sun Yerui, I looked the query parsing part again, it's possible to delay
> the aggregation mapping to be after cube selection. And then the type info
> on the cube can supplement the mapping. It requires some refactoring
> effort, but won't affect the MeasureType interface. You can proceed
> implementation at your side while I work on this change.
> 
> On Fri, Dec 11, 2015 at 11:36 AM, Li Yang <liy...@apache.org> wrote:
> 
>> I can see the need from user perspective. Let me look again at the query
>> parsing logic and see if any tweak is possible.
>> 
>> On Fri, Dec 11, 2015 at 7:59 AM, Luke Han <luke...@gmail.com> wrote:
>> 
>>> It should transparent to users, they should always use "count(distinct
>>> seller_id)"
>>> 
>>> How about one setting value when user pickup "DistinctCount"? We already
>>> have error range, it should be easy to have one more option say "Precise"
>>> (but yes, also have to display warn message about the disadvantage for
>>> this). Then in code level, it could be easy to handle like Yerui
>>> mentioned.
>>> 
>>> Thanks.
>>> 
>>> 
>>> 
>>> 
>>> Best Regards!
>>> ---------------------
>>> 
>>> Luke Han
>>> 
>>> On Thu, Dec 10, 2015 at 7:33 PM, Yerui Sun <sunye...@gmail.com> wrote:
>>> 
>>>> You’re right, I ignored that can’t get return type from query context.
>>>> 
>>>> I’m not familiar with Calcite UDF, do you mean a new sql writing like
>>>> “count (distinct_precise seller_id)”? That’s not transparent for user,
>>>> seems not the best way.
>>>> 
>>>> Another way is still mapping count distinct query to one aggr func, and
>>>> making the func could handle variety of ValueType. For example,
>>> abstracting
>>>> a count distinct measure type called ‘CountDistinctMeasureType’, as
>>> parent
>>>> of HLLCMeasureType and BitmapMeasureType, and mapping all count distinct
>>>> query to ‘CountDistinctAggFunc’, with abstract class
>>> ‘CountDistinctCounter’
>>>> as add() and merge() parameter type. When this aggr func was called, the
>>>> processing depends on the value type, like HLLCounter or BitmapCounter.
>>>> 
>>>> I’not sure whether I’ve described it clear. Actually I have implemented
>>>> bitmap count distinct in 1.x-staging by this way, keeping hll count
>>>> distinct still working. Maybe I could implement it in 2.x-staging with
>>> your
>>>> refactoring, and we could review the code later?
>>>> 
>>>>> 在 2015年12月10日,18:23,Li Yang <liy...@apache.org> 写道:
>>>>> 
>>>>> I've considered exactly the same point. It does not work when mapping
>>> a
>>>>> query to the aggregation functions. A query will simply say "count
>>>>> (distinct seller_id)", and won't mention any return type.
>>>>> 
>>>>> The way out is adding a new aggregation for your count distinct using
>>>>> Calcite UDF, then it can be correctly mapped. I don't have an example
>>>> yet,
>>>>> so we need do some exploration here. Actually I hope to use your case
>>> as
>>>> an
>>>>> example.  :-)
>>>>> 
>>>>> 
>>>>> 
>>>>> On Thu, Dec 10, 2015 at 4:25 PM, Yerui Sun <sunye...@gmail.com>
>>> wrote:
>>>>> 
>>>>>> It’s really great job, Yang!
>>>>>> 
>>>>>> I have a question about the MeasureTypeFactory. In the current
>>>> 2.x-stating
>>>>>> code, two built-in measure types (hll and topn) were registered, and
>>> the
>>>>>> factory create the corresponding MeasureType only by funcName
>>>>>> (‘COUNT_DISTINCT’ for hll and ‘TOP_N’ for topn).
>>>>>> However, if I want to create a new measure type with same funcName,
>>>> that’s
>>>>>> impossible. For example, I want to create bitmap measure by funcName
>>>>>> ‘COUNT_DISTINCT’, same as hll measure's funcName.
>>>>>> 
>>>>>> One possible way is that factory create measure type not only rely on
>>>>>> funcName, but also returnType, making one funcName to multi measure
>>> is
>>>>>> possible.
>>>>>> In another word, we could define the measure type in factory using
>>>>>> funcName and returnType, instead of only funcName for now.
>>>>>> 
>>>>>> Do you think this make sense? Looking for your comment.
>>>>>> 
>>>>>>> 在 2015年12月10日,14:57,Li Yang <liy...@apache.org> 写道:
>>>>>>> 
>>>>>>>> Would it be possible to create a How to guide on ability to add
>>> custom
>>>>>> aggregates
>>>>>>> into Kylin
>>>>>>> 
>>>>>>> Definitely! I should spent some time on documentation in the
>>> following
>>>>>>> days. Many features have been added to 2.x. Aiming to release a 2.0
>>>> beta
>>>>>>> soon, it's time to work on document. :-)
>>>>>>> 
>>>>>>>> Where are the custom aggregates computed on the Kylin Service or on
>>>>>> Hbase
>>>>>>> CoProcessors?
>>>>>>> 
>>>>>>> The aggregation takes place in MR during cube build, then in
>>>> CoProcessor
>>>>>>> and query service during query. Originally I hoped user can add new
>>>>>>> aggregation by just dropping a jar ball and some configuration.
>>> However
>>>>>> it
>>>>>>> turns out to be more than that due to CoProcessor... Anyway, it's a
>>> lot
>>>>>>> more friendly to developers now.
>>>>>>> 
>>>>>>> On Thu, Dec 10, 2015 at 2:14 PM, hongbin ma <mahong...@apache.org>
>>>>>> wrote:
>>>>>>> 
>>>>>>>> hi seshu
>>>>>>>> 
>>>>>>>> yang's work is more of a framework. it reduces developers' efforts
>>> if
>>>>>>>> he/she wants to add a new custom aggregations. Since some of the
>>>>>>>> aggregations happens in coprocessors, we cannot completely get rid
>>> of
>>>>>>>> re-compiling & re-deploying. If someone from the community is
>>>>>> interested in
>>>>>>>> crafting a new aggregation, he/she can take a look at how HLL/TOPN
>>>>>>>> aggregation is implemented.
>>>>>>>> 
>>>>>>>> On Wed, Dec 9, 2015 at 9:43 PM, Adunuthula, Seshu <
>>>> sadunuth...@ebay.com
>>>>>>> 
>>>>>>>> wrote:
>>>>>>>> 
>>>>>>>>> Yang,
>>>>>>>>> 
>>>>>>>>> Would it be possible to create a How to guide on ability to add
>>>> custom
>>>>>>>>> aggregates into Kylin. Javadocs are good, but to encourage
>>> community
>>>>>>>>> participation we should make it more easily consumable.
>>>>>>>>> 
>>>>>>>>> Where are the custom aggregates computed on the Kylin Service or
>>> on
>>>>>> Hbase
>>>>>>>>> CoProcessors?
>>>>>>>>> 
>>>>>>>>> Regards
>>>>>>>>> Seshu Adunuthula.
>>>>>>>>> 
>>>>>>>>> On 12/8/15, 6:18 AM, "Adunuthula, Seshu" <sadunuth...@ebay.com>
>>>> wrote:
>>>>>>>>> 
>>>>>>>>>> This is awesome!
>>>>>>>>>> 
>>>>>>>>>> On 12/8/15, 6:05 AM, "Shi, Shaofeng" <shao...@ebay.com> wrote:
>>>>>>>>>> 
>>>>>>>>>>> This is another important refactor since making the build/query
>>>>>> engines
>>>>>>>>>>> as
>>>>>>>>>>> plugable. Thanks Yang!
>>>>>>>>>>> 
>>>>>>>>>>> On 12/8/15, 5:47 PM, "Li Yang" <liy...@apache.org> wrote:
>>>>>>>>>>> 
>>>>>>>>>>>> This is a bump of KYLIN-976 in case you are not yet aware...
>>>>>>>>>>>> 
>>>>>>>>>>>> KYLIN-976 is a refactoring of how Kylin works with aggregation
>>> and
>>>>>>>> aims
>>>>>>>>>>>> to
>>>>>>>>>>>> allow adding custom aggregation types easily.
>>>>>>>>>>>> 
>>>>>>>>>>>> Kylin started with basic support of SUM, COUNT, MAX, MIN, AVG
>>>> (from
>>>>>>>> sum
>>>>>>>>>>>> and
>>>>>>>>>>>> count), and COUNT_DISTINCT (based on hyperloglog). Later TopN
>>> is
>>>>>> added
>>>>>>>>>>>> in
>>>>>>>>>>>> 2.x branch. And the list is growing for sure. Xiaoyu is
>>> working on
>>>>>>>>>>>> storing
>>>>>>>>>>>> raw records as a special type of measure (KYLIN-1122), also
>>> Yerui
>>>> is
>>>>>>>>>>>> working on precise count distinct using bitmap (KYLIN-1186).
>>>>>>>>>>>> 
>>>>>>>>>>>> The possibility is unlimited. Implement a domain specific
>>>>>> aggregation
>>>>>>>> is
>>>>>>>>>>>> now quite easy. E.g. aggregate user events to detect time
>>> serials
>>>> or
>>>>>>>>>>>> access
>>>>>>>>>>>> patterns. Or draw a sketch of certain user groups. Or
>>>> pre-calculate
>>>>>>>>>>>> clusters of data points. Or histogram... Use your imagination.
>>>>>>>>>>>> 
>>>>>>>>>>>> Whoever interested can peek at MeasureTypeFactory and
>>> MeasureType
>>>> on
>>>>>>>> 2.x
>>>>>>>>>>>> branch. The API may still change, but at the same time is
>>> stable
>>>>>>>> enough
>>>>>>>>>>>> for
>>>>>>>>>>>> pilots. The javadoc should get you started. HLLCMeasureType and
>>>>>>>>>>>> TopNMeasureType are two good examples.
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> Cheers
>>>>>>>>>>>> Yang
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> --
>>>>>>>> Regards,
>>>>>>>> 
>>>>>>>> *Bin Mahone | 马洪宾*
>>>>>>>> Apache Kylin: http://kylin.io
>>>>>>>> Github: https://github.com/binmahone
>>>>>>>> 
>>>>>> 
>>>>>> 
>>>> 
>>>> 
>>> 
>> 
>> 

Reply via email to