That helps. Thank You James. I'm in the right direction not trying further
to achieve non-standard multi arg version of an
aggregate function :):)

And as a next step,  I will push my existing (single col) aggregate
function and will keep you posted.

Regards
Swapna

On Tue, May 10, 2016 at 10:57 AM, James Taylor <[email protected]>
wrote:

> Yes, the way to optimize it is to not represent data in column qualifiers,
> but as the value of a column instead (perhaps in the primary key
> constraint) and to do the group by query I mentioned before.
>
> Otherwise, you can do separate aggregations as you've shown as it'd perform
> the same as trying to support a non standard multi arg version of an
> aggregate function.
>
> Thanks,
> James
>
> On Tuesday, May 10, 2016, Swapna Swapna <[email protected]> wrote:
>
> > Hi James,
> >
> > thanks for your response. In the below example, us & uk are column
> > qualifiers.
> >
> > * rowkey                c:us        c:uk*
> >  20161001             3             4
> >  20161002             1             2
> >
> >
> > This is how my query looks like:
> > select sum1(us) as US, sum1(uk) as UK from table;
> >
> > which returns the below output: (as expected)
> > *US  UK*
> > 4     6
> >
> > is there any better way to achieve/optimize this. This seems to be not an
> > ideal solution when we have large number of columns.
> >
> > Thanks
> > Swapna
> >
> >
> > On Tue, May 10, 2016 at 12:04 AM, James Taylor <[email protected]
> > <javascript:;>>
> > wrote:
> >
> > > We don't have aggregate functions with multiple arguments, so I can't
> > > provide any pointers. It's unclear what semantics you're trying to
> > achieve
> > > with the multiple arguments. Can you give a concrete example? Based on
> > your
> > > other example, you'd want to do a GROUP BY, like this:
> > >
> > > select sum(col) from table group by country;
> > >
> > > On Mon, May 9, 2016 at 4:57 PM, Swapna Swapna <[email protected]
> > <javascript:;>>
> > > wrote:
> > >
> > > > Hi James/Team,
> > > >
> > > >
> > > > myaggFunc(col1,col2)
> > > >
> > > > I tried implementing this new aggregate function with multiple (2, to
> > > start
> > > > with)  columns , expressions as arguments.
> > > >
> > > > And its giving me this error:
> > > >
> > > > index (1) must be less than size (1)
> > > >
> > > >
> > > > My function definition:
> > > >
> > > > @FunctionParseNode.BuiltInFunction(name = myaggFunc.NAME, nodeClass =
> > > > MyAggParseNode.class, args = {
> > > >         @FunctionParseNode.Argument(),
> > > >         @FunctionParseNode.Argument()})
> > > >
> > > > is there any example that I can refer, which accepts multiple fields
> > > > as arguments to function.
> > > >
> > > > Any pointers would really help.
> > > >
> > > > Thank you,
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > On Wed, May 4, 2016 at 12:05 AM, Swapna Swapna <
> [email protected]
> > <javascript:;>>
> > > > wrote:
> > > >
> > > > > Hi James,
> > > > >
> > > > > the new ones are in similar lines to existing aggregate functions:
> > > > >
> > > > > I misinterpreted this definition, thanks for clarifying :
> > > > > *A reference to a column is also an expression*
> > > > >
> > > > > Regards
> > > > > Swapna
> > > > >
> > > > > On Tue, May 3, 2016 at 11:39 PM, James Taylor <
> > [email protected] <javascript:;>>
> > > > > wrote:
> > > > >
> > > > >> Hi Swapna,
> > > > >> All our aggregate functions allow expressions as arguments and it
> > > > wouldn't
> > > > >> make sense to have these new ones be different. A reference to a
> > > column
> > > > is
> > > > >> also an expression. It doesn't change the HBase data model being
> > > sparse.
> > > > >>
> > > > >> I think the next step should be for you to submit a patch that the
> > > > >> community can take a look at, as it's too difficult to discuss
> this
> > > > >> without
> > > > >> that.
> > > > >>
> > > > >> Thanks,
> > > > >> James
> > > > >>
> > > > >> On Tuesday, May 3, 2016, Swapna Swapna <[email protected]
> > <javascript:;>>
> > > wrote:
> > > > >>
> > > > >> > Hi James,
> > > > >> >
> > > > >> > Thanks for your swift response.
> > > > >> >
> > > > >> > I wouldn't be able to use the expression in the below query
> > rather I
> > > > >> would
> > > > >> > have to provide the columns (as arguments) which I'm interested
> in
> > > to
> > > > >> > perform the aggregation on respective provided columns.
> > > > >> >
> > > > >> > myaggFunc(col1,col2)
> > > > >> >
> > > > >> > the reason being, the hbase data is sparsed and I would not know
> > the
> > > > >> column
> > > > >> > values. Data fetch is based on a row key.
> > > > >> >
> > > > >> > expression example:
> > > > >> >
> > > > >> > ID=1 OR NAME='Hi'
> > > > >> >
> > > > >> > Regards
> > > > >> >
> > > > >> > Swapna
> > > > >> >
> > > > >> >
> > > > >> >
> > > > >> > On Tue, May 3, 2016 at 7:17 PM, James Taylor <
> > > [email protected] <javascript:;>
> > > > >> > <javascript:;>> wrote:
> > > > >> >
> > > > >> > > Hi Swapna,
> > > > >> > > The return type is typically derived from looking at the
> return
> > > > types
> > > > >> of
> > > > >> > > each of the input arguments and choosing what'll work without
> > > losing
> > > > >> > > precision. For example, take a look at this loop in
> > > > ExpressionCompiler
> > > > >> > that
> > > > >> > > determines this for expressions that are added together:
> > > > >> > >
> > > > >> > >         new ArithmeticExpressionFactory() {
> > > > >> > >             @Override
> > > > >> > >             public Expression create(ArithmeticParseNode node,
> > > > >> > > List<Expression> children) throws SQLException {
> > > > >> > >                 boolean foundDate = false;
> > > > >> > >                 Determinism determinism = Determinism.ALWAYS;
> > > > >> > >                 PDataType theType = null;
> > > > >> > >                 for(int i = 0; i < children.size(); i++) {
> > > > >> > >
> > > > >> > > Your probably already doing this, but make sure you don't
> assume
> > > the
> > > > >> > > arguments are column references, but allow them to be any
> > > > expression.
> > > > >> > >
> > > > >> > > Also, it'd be great to see what you've got so far without
> > handling
> > > > >> > multiple
> > > > >> > > arguments to your function (in the form of a pull request) so
> > > folks
> > > > >> can
> > > > >> > get
> > > > >> > > you feedback on your work so far.
> > > > >> > >
> > > > >> > > Thanks, and we appreciate the contributions!
> > > > >> > >
> > > > >> > > James
> > > > >> > >
> > > > >> > > On Tue, May 3, 2016 at 12:59 PM, Swapna Swapna <
> > > > >> [email protected] <javascript:;>
> > > > >> > <javascript:;>>
> > > > >> > > wrote:
> > > > >> > >
> > > > >> > > > Sure,
> > > > >> > > >
> > > > >> > > > Hbase data that I have is:
> > > > >> > > >
> > > > >> > > > rowkey                us         uk
> > > > >> > > > 20161001           3            4
> > > > >> > > > 20161002           1            2
> > > > >> > > >
> > > > >> > > >
> > > > >> > > > select myaggFunc(us) from table :    // this is returning
> > output
> > > > as
> > > > >> :
> > > > >> > > > 4
> > > > >> > > > select myaggFunc(uk) from table :    // this is returning
> > output
> > > > as
> > > > >> :
> > > > >> > > > 6
> > > > >> > > >
> > > > >> > > > In similar to that, i'm visualizing the query like: select
> > > > >> > > > myaggFunc1(us,uk)
> > > > >> > > > from table;  //with multiple columns
> > > > >> > > >
> > > > >> > > > to return output:   (based on the aggregation logic, below
> > > results
> > > > >> are
> > > > >> > > for
> > > > >> > > > sum aggregation)
> > > > >> > > > us   4
> > > > >> > > > uk   6
> > > > >> > > >
> > > > >> > > >
> > > > >> > > >
> > > > >> > > > On Tue, May 3, 2016 at 11:33 AM, James Taylor <
> > > > >> [email protected] <javascript:;>
> > > > >> > <javascript:;>>
> > > > >> > > > wrote:
> > > > >> > > >
> > > > >> > > > > Removing user list (please don't cross post)
> > > > >> > > > >
> > > > >> > > > > Can you give us a full example of the query you have in
> > mind?
> > > > >> > > > >
> > > > >> > > > > Thanks,
> > > > >> > > > > James
> > > > >> > > > >
> > > > >> > > > > On Tue, May 3, 2016 at 11:14 AM, Swapna Swapna <
> > > > >> > [email protected] <javascript:;> <javascript:;>
> > > > >> > > >
> > > > >> > > > > wrote:
> > > > >> > > > >
> > > > >> > > > > > Hi,
> > > > >> > > > > >
> > > > >> > > > > > I'm trying to implement aggregate function on multiple
> > > columns
> > > > >> (as
> > > > >> > an
> > > > >> > > > > > arguments) like:
> > > > >> > > > > >
> > > > >> > > > > > myaggFunc(col1,col2)
> > > > >> > > > > >
> > > > >> > > > > > And I would want to return the results by each column
> > after
> > > > >> > applying
> > > > >> > > > > > aggregate operation.
> > > > >> > > > > >
> > > > >> > > > > > The output would be something like:
> > > > >> > > > > >
> > > > >> > > > > > col1, count ( aggregate of all records for col1)
> > > > >> > > > > > col2, count
> > > > >> > > > > >
> > > > >> > > > > > Inorder to return the results in the above format, what
> is
> > > the
> > > > >> > return
> > > > >> > > > > data
> > > > >> > > > > > type (of the method) should I have to choose?
> > > > >> > > > > >
> > > > >> > > > > > Thanks
> > > > >> > > > > >
> > > > >> > > > >
> > > > >> > > >
> > > > >> > >
> > > > >> >
> > > > >>
> > > > >
> > > > >
> > > >
> > >
> >
>

Reply via email to