That helps. Thank You James. I'm in the right direction not trying further to achieve non-standard multi arg version of an aggregate function :):)
And as a next step, I will push my existing (single col) aggregate function and will keep you posted. Regards Swapna On Tue, May 10, 2016 at 10:57 AM, James Taylor <[email protected]> wrote: > Yes, the way to optimize it is to not represent data in column qualifiers, > but as the value of a column instead (perhaps in the primary key > constraint) and to do the group by query I mentioned before. > > Otherwise, you can do separate aggregations as you've shown as it'd perform > the same as trying to support a non standard multi arg version of an > aggregate function. > > Thanks, > James > > On Tuesday, May 10, 2016, Swapna Swapna <[email protected]> wrote: > > > Hi James, > > > > thanks for your response. In the below example, us & uk are column > > qualifiers. > > > > * rowkey c:us c:uk* > > 20161001 3 4 > > 20161002 1 2 > > > > > > This is how my query looks like: > > select sum1(us) as US, sum1(uk) as UK from table; > > > > which returns the below output: (as expected) > > *US UK* > > 4 6 > > > > is there any better way to achieve/optimize this. This seems to be not an > > ideal solution when we have large number of columns. > > > > Thanks > > Swapna > > > > > > On Tue, May 10, 2016 at 12:04 AM, James Taylor <[email protected] > > <javascript:;>> > > wrote: > > > > > We don't have aggregate functions with multiple arguments, so I can't > > > provide any pointers. It's unclear what semantics you're trying to > > achieve > > > with the multiple arguments. Can you give a concrete example? Based on > > your > > > other example, you'd want to do a GROUP BY, like this: > > > > > > select sum(col) from table group by country; > > > > > > On Mon, May 9, 2016 at 4:57 PM, Swapna Swapna <[email protected] > > <javascript:;>> > > > wrote: > > > > > > > Hi James/Team, > > > > > > > > > > > > myaggFunc(col1,col2) > > > > > > > > I tried implementing this new aggregate function with multiple (2, to > > > start > > > > with) columns , expressions as arguments. > > > > > > > > And its giving me this error: > > > > > > > > index (1) must be less than size (1) > > > > > > > > > > > > My function definition: > > > > > > > > @FunctionParseNode.BuiltInFunction(name = myaggFunc.NAME, nodeClass = > > > > MyAggParseNode.class, args = { > > > > @FunctionParseNode.Argument(), > > > > @FunctionParseNode.Argument()}) > > > > > > > > is there any example that I can refer, which accepts multiple fields > > > > as arguments to function. > > > > > > > > Any pointers would really help. > > > > > > > > Thank you, > > > > > > > > > > > > > > > > > > > > > > > > On Wed, May 4, 2016 at 12:05 AM, Swapna Swapna < > [email protected] > > <javascript:;>> > > > > wrote: > > > > > > > > > Hi James, > > > > > > > > > > the new ones are in similar lines to existing aggregate functions: > > > > > > > > > > I misinterpreted this definition, thanks for clarifying : > > > > > *A reference to a column is also an expression* > > > > > > > > > > Regards > > > > > Swapna > > > > > > > > > > On Tue, May 3, 2016 at 11:39 PM, James Taylor < > > [email protected] <javascript:;>> > > > > > wrote: > > > > > > > > > >> Hi Swapna, > > > > >> All our aggregate functions allow expressions as arguments and it > > > > wouldn't > > > > >> make sense to have these new ones be different. A reference to a > > > column > > > > is > > > > >> also an expression. It doesn't change the HBase data model being > > > sparse. > > > > >> > > > > >> I think the next step should be for you to submit a patch that the > > > > >> community can take a look at, as it's too difficult to discuss > this > > > > >> without > > > > >> that. > > > > >> > > > > >> Thanks, > > > > >> James > > > > >> > > > > >> On Tuesday, May 3, 2016, Swapna Swapna <[email protected] > > <javascript:;>> > > > wrote: > > > > >> > > > > >> > Hi James, > > > > >> > > > > > >> > Thanks for your swift response. > > > > >> > > > > > >> > I wouldn't be able to use the expression in the below query > > rather I > > > > >> would > > > > >> > have to provide the columns (as arguments) which I'm interested > in > > > to > > > > >> > perform the aggregation on respective provided columns. > > > > >> > > > > > >> > myaggFunc(col1,col2) > > > > >> > > > > > >> > the reason being, the hbase data is sparsed and I would not know > > the > > > > >> column > > > > >> > values. Data fetch is based on a row key. > > > > >> > > > > > >> > expression example: > > > > >> > > > > > >> > ID=1 OR NAME='Hi' > > > > >> > > > > > >> > Regards > > > > >> > > > > > >> > Swapna > > > > >> > > > > > >> > > > > > >> > > > > > >> > On Tue, May 3, 2016 at 7:17 PM, James Taylor < > > > [email protected] <javascript:;> > > > > >> > <javascript:;>> wrote: > > > > >> > > > > > >> > > Hi Swapna, > > > > >> > > The return type is typically derived from looking at the > return > > > > types > > > > >> of > > > > >> > > each of the input arguments and choosing what'll work without > > > losing > > > > >> > > precision. For example, take a look at this loop in > > > > ExpressionCompiler > > > > >> > that > > > > >> > > determines this for expressions that are added together: > > > > >> > > > > > > >> > > new ArithmeticExpressionFactory() { > > > > >> > > @Override > > > > >> > > public Expression create(ArithmeticParseNode node, > > > > >> > > List<Expression> children) throws SQLException { > > > > >> > > boolean foundDate = false; > > > > >> > > Determinism determinism = Determinism.ALWAYS; > > > > >> > > PDataType theType = null; > > > > >> > > for(int i = 0; i < children.size(); i++) { > > > > >> > > > > > > >> > > Your probably already doing this, but make sure you don't > assume > > > the > > > > >> > > arguments are column references, but allow them to be any > > > > expression. > > > > >> > > > > > > >> > > Also, it'd be great to see what you've got so far without > > handling > > > > >> > multiple > > > > >> > > arguments to your function (in the form of a pull request) so > > > folks > > > > >> can > > > > >> > get > > > > >> > > you feedback on your work so far. > > > > >> > > > > > > >> > > Thanks, and we appreciate the contributions! > > > > >> > > > > > > >> > > James > > > > >> > > > > > > >> > > On Tue, May 3, 2016 at 12:59 PM, Swapna Swapna < > > > > >> [email protected] <javascript:;> > > > > >> > <javascript:;>> > > > > >> > > wrote: > > > > >> > > > > > > >> > > > Sure, > > > > >> > > > > > > > >> > > > Hbase data that I have is: > > > > >> > > > > > > > >> > > > rowkey us uk > > > > >> > > > 20161001 3 4 > > > > >> > > > 20161002 1 2 > > > > >> > > > > > > > >> > > > > > > > >> > > > select myaggFunc(us) from table : // this is returning > > output > > > > as > > > > >> : > > > > >> > > > 4 > > > > >> > > > select myaggFunc(uk) from table : // this is returning > > output > > > > as > > > > >> : > > > > >> > > > 6 > > > > >> > > > > > > > >> > > > In similar to that, i'm visualizing the query like: select > > > > >> > > > myaggFunc1(us,uk) > > > > >> > > > from table; //with multiple columns > > > > >> > > > > > > > >> > > > to return output: (based on the aggregation logic, below > > > results > > > > >> are > > > > >> > > for > > > > >> > > > sum aggregation) > > > > >> > > > us 4 > > > > >> > > > uk 6 > > > > >> > > > > > > > >> > > > > > > > >> > > > > > > > >> > > > On Tue, May 3, 2016 at 11:33 AM, James Taylor < > > > > >> [email protected] <javascript:;> > > > > >> > <javascript:;>> > > > > >> > > > wrote: > > > > >> > > > > > > > >> > > > > Removing user list (please don't cross post) > > > > >> > > > > > > > > >> > > > > Can you give us a full example of the query you have in > > mind? > > > > >> > > > > > > > > >> > > > > Thanks, > > > > >> > > > > James > > > > >> > > > > > > > > >> > > > > On Tue, May 3, 2016 at 11:14 AM, Swapna Swapna < > > > > >> > [email protected] <javascript:;> <javascript:;> > > > > >> > > > > > > > >> > > > > wrote: > > > > >> > > > > > > > > >> > > > > > Hi, > > > > >> > > > > > > > > > >> > > > > > I'm trying to implement aggregate function on multiple > > > columns > > > > >> (as > > > > >> > an > > > > >> > > > > > arguments) like: > > > > >> > > > > > > > > > >> > > > > > myaggFunc(col1,col2) > > > > >> > > > > > > > > > >> > > > > > And I would want to return the results by each column > > after > > > > >> > applying > > > > >> > > > > > aggregate operation. > > > > >> > > > > > > > > > >> > > > > > The output would be something like: > > > > >> > > > > > > > > > >> > > > > > col1, count ( aggregate of all records for col1) > > > > >> > > > > > col2, count > > > > >> > > > > > > > > > >> > > > > > Inorder to return the results in the above format, what > is > > > the > > > > >> > return > > > > >> > > > > data > > > > >> > > > > > type (of the method) should I have to choose? > > > > >> > > > > > > > > > >> > > > > > Thanks > > > > >> > > > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > > > > > > > > > > > > > > > > >
