Hi James/Team,

myaggFunc(col1,col2)

I tried implementing this new aggregate function with multiple (2, to start
with)  columns , expressions as arguments.

And its giving me this error:

index (1) must be less than size (1)


My function definition:

@FunctionParseNode.BuiltInFunction(name = myaggFunc.NAME, nodeClass =
MyAggParseNode.class, args = {
        @FunctionParseNode.Argument(),
        @FunctionParseNode.Argument()})

is there any example that I can refer, which accepts multiple fields
as arguments to function.

Any pointers would really help.

Thank you,





On Wed, May 4, 2016 at 12:05 AM, Swapna Swapna <[email protected]>
wrote:

> Hi James,
>
> the new ones are in similar lines to existing aggregate functions:
>
> I misinterpreted this definition, thanks for clarifying :
> *A reference to a column is also an expression*
>
> Regards
> Swapna
>
> On Tue, May 3, 2016 at 11:39 PM, James Taylor <[email protected]>
> wrote:
>
>> Hi Swapna,
>> All our aggregate functions allow expressions as arguments and it wouldn't
>> make sense to have these new ones be different. A reference to a column is
>> also an expression. It doesn't change the HBase data model being sparse.
>>
>> I think the next step should be for you to submit a patch that the
>> community can take a look at, as it's too difficult to discuss this
>> without
>> that.
>>
>> Thanks,
>> James
>>
>> On Tuesday, May 3, 2016, Swapna Swapna <[email protected]> wrote:
>>
>> > Hi James,
>> >
>> > Thanks for your swift response.
>> >
>> > I wouldn't be able to use the expression in the below query rather I
>> would
>> > have to provide the columns (as arguments) which I'm interested in to
>> > perform the aggregation on respective provided columns.
>> >
>> > myaggFunc(col1,col2)
>> >
>> > the reason being, the hbase data is sparsed and I would not know the
>> column
>> > values. Data fetch is based on a row key.
>> >
>> > expression example:
>> >
>> > ID=1 OR NAME='Hi'
>> >
>> > Regards
>> >
>> > Swapna
>> >
>> >
>> >
>> > On Tue, May 3, 2016 at 7:17 PM, James Taylor <[email protected]
>> > <javascript:;>> wrote:
>> >
>> > > Hi Swapna,
>> > > The return type is typically derived from looking at the return types
>> of
>> > > each of the input arguments and choosing what'll work without losing
>> > > precision. For example, take a look at this loop in ExpressionCompiler
>> > that
>> > > determines this for expressions that are added together:
>> > >
>> > >         new ArithmeticExpressionFactory() {
>> > >             @Override
>> > >             public Expression create(ArithmeticParseNode node,
>> > > List<Expression> children) throws SQLException {
>> > >                 boolean foundDate = false;
>> > >                 Determinism determinism = Determinism.ALWAYS;
>> > >                 PDataType theType = null;
>> > >                 for(int i = 0; i < children.size(); i++) {
>> > >
>> > > Your probably already doing this, but make sure you don't assume the
>> > > arguments are column references, but allow them to be any expression.
>> > >
>> > > Also, it'd be great to see what you've got so far without handling
>> > multiple
>> > > arguments to your function (in the form of a pull request) so folks
>> can
>> > get
>> > > you feedback on your work so far.
>> > >
>> > > Thanks, and we appreciate the contributions!
>> > >
>> > > James
>> > >
>> > > On Tue, May 3, 2016 at 12:59 PM, Swapna Swapna <
>> [email protected]
>> > <javascript:;>>
>> > > wrote:
>> > >
>> > > > Sure,
>> > > >
>> > > > Hbase data that I have is:
>> > > >
>> > > > rowkey                us         uk
>> > > > 20161001           3            4
>> > > > 20161002           1            2
>> > > >
>> > > >
>> > > > select myaggFunc(us) from table :    // this is returning output as
>> :
>> > > > 4
>> > > > select myaggFunc(uk) from table :    // this is returning output as
>> :
>> > > > 6
>> > > >
>> > > > In similar to that, i'm visualizing the query like: select
>> > > > myaggFunc1(us,uk)
>> > > > from table;  //with multiple columns
>> > > >
>> > > > to return output:   (based on the aggregation logic, below results
>> are
>> > > for
>> > > > sum aggregation)
>> > > > us   4
>> > > > uk   6
>> > > >
>> > > >
>> > > >
>> > > > On Tue, May 3, 2016 at 11:33 AM, James Taylor <
>> [email protected]
>> > <javascript:;>>
>> > > > wrote:
>> > > >
>> > > > > Removing user list (please don't cross post)
>> > > > >
>> > > > > Can you give us a full example of the query you have in mind?
>> > > > >
>> > > > > Thanks,
>> > > > > James
>> > > > >
>> > > > > On Tue, May 3, 2016 at 11:14 AM, Swapna Swapna <
>> > [email protected] <javascript:;>
>> > > >
>> > > > > wrote:
>> > > > >
>> > > > > > Hi,
>> > > > > >
>> > > > > > I'm trying to implement aggregate function on multiple columns
>> (as
>> > an
>> > > > > > arguments) like:
>> > > > > >
>> > > > > > myaggFunc(col1,col2)
>> > > > > >
>> > > > > > And I would want to return the results by each column after
>> > applying
>> > > > > > aggregate operation.
>> > > > > >
>> > > > > > The output would be something like:
>> > > > > >
>> > > > > > col1, count ( aggregate of all records for col1)
>> > > > > > col2, count
>> > > > > >
>> > > > > > Inorder to return the results in the above format, what is the
>> > return
>> > > > > data
>> > > > > > type (of the method) should I have to choose?
>> > > > > >
>> > > > > > Thanks
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>>
>
>

Reply via email to