Hi, So if I am to check or debug the use of each of the different descriptors for a function I am to use a clustered setup. Am I right? I am not sure how to debug in a clustered environment. This is to check how array_avg() function works and see if understand better.
Further I see that the array_avg function seem to translate into sql-avg function? I am not sure where that happens. Sorry about all the silly questions. On 25 July 2017 at 06:54, Preston Carman <[email protected]> wrote: > When dealing with aggregates and query plans, I find it helpful to > think about how the aggregate will work in a distributed environment. > AsterixDB compiler will make optimizations based on the types of data > partitioning. If the data is unpartitioned then a single aggregate > operator and function can calculate the result. If the data is > partitioned, then sending all the data must be send to a single node > for processing, which is not very efficient. The aggregate process > could be split up into two steps. AsterixDB optimizes the query by > running a process on each partition locally and then sending an > intermediate result to a single node to create the final aggregate > result. > > COUNT > In the case of count, the local process is COUNT, but the global > aggregate process is SUM. We do not want to count responses, but sum > the total local count values. > > AVG > In the count case, we use a complete separate aggregate function for > the global step. Consider AVG, to compute the average you need to know > the count and sum. In this case the local functions find both the > count and sum. These values are then passed to a global aggregate > function which uses these local results to calculate the average > aggregate result. > > Take a look at the query plans for a COUNT and AVG query. The > optimized query plan will show you the two aggregate operators. > > As you look at the code, AVG would probably be more informative about > the full aggregation workflow. > > > On Mon, Jul 24, 2017 at 8:28 AM, Riyafa Abdul Hameed > <[email protected]> wrote: > > On 23 July 2017 at 22:59, Yingyi Bu <[email protected]> wrote: > > > >> >> I see AVG, LOCAL_AVG, INTERMEDIATE_AVG and GLOBAL_AVG. > >> > >> AVG: that's the local function in the local plan. > >> LOCAL_AVG, INTERMEDIATE_AVG and GLOBAL_AVG: think about distributed > >> computation of average. LOCAL_AVG aggregates the sum/count at the local > >> data source, INTERMEDIATE_AVG aggregates the sum/count over partially > >> aggregated sums/counts, and GLOBAL_AVG computes the final average value > >> from intermediate sums/counts. > >> > > > > How do we decide if we need these descriptors? COUNT seems to have only > > one descriptor > > > > > >> > >> Best, > >> Yingyi > >> > >> > >> On Sat, Jul 22, 2017 at 9:43 PM, Riyafa Abdul Hameed < > >> [email protected]> wrote: > >> > >> > Hi, > >> > > >> > Thanks for the explanation. > >> > But there are so many things I still don't understand. One of them is > for > >> > the avg function itself there are several FuntionIdentifiers. What do > >> they > >> > all mean? > >> > > >> > I see AVG, LOCAL_AVG, INTERMEDIATE_AVG and GLOBAL_AVG. > >> > > >> > What do they all mean? > >> > Please help > >> > > >> > On 19 July 2017 at 21:56, Yingyi Bu <[email protected]> wrote: > >> > > >> > > Hi Riyafa, > >> > > > >> > > >> ScalarCountAggregateDescriptor > >> > > It's used for counting a scalar array that appears inside a tuple. > >> > > For example: > >> > > SELECT u.id, array_count(u.friends) > >> > > FROM users u; > >> > > > >> > > >> SerializableCountAggregateDescriptor > >> > > Serialized aggregation descriptor implementations are only used > in > >> > > hash-based group-by. > >> > > For example: > >> > > SELECT u.city, count(*) > >> > > FROM users u > >> > > /*+ hash */ > >> > > GROUP BY u.city; > >> > > > >> > > If your aggregation function doesn't have a fixed-byte-sized > state, > >> you > >> > > don't need to worry about that or implement that. > >> > > > >> > > >> CountAggregateDescriptor > >> > > This is used in group-by or global aggregate: > >> > > For example: > >> > > SELECT u.city, count(*) > >> > > FROM users u > >> > > GROUP BY u.city; > >> > > > >> > > SELECT count(*) FROM users; > >> > > > >> > > > >> > > Best, > >> > > Yingyi > >> > > > >> > > > >> > > On Wed, Jul 19, 2017 at 7:55 AM, Riyafa Abdul Hameed < > >> [email protected]> > >> > > wrote: > >> > > > >> > > > Hi again, > >> > > > > >> > > > Any suggestions on this? Or anyone I can reach to who are not on > this > >> > > list > >> > > > or not active on the list? > >> > > > > >> > > > Thank you. > >> > > > > >> > > > On 17 July 2017 at 17:18, Riyafa Abdul Hameed <[email protected]> > >> > wrote: > >> > > > > >> > > > > Hi again, > >> > > > > > >> > > > > I think I can understand how to write the descriptor in the > >> packages: > >> > > > > org.apache.asterix.runtime.aggregates.std and > >> > > > org.apache.asterix.runtime.aggregates.scalar. > >> > > > > But I am not sure I understand how to write the descriptor in > the > >> > > > package: > >> > > > > org.apache.asterix.runtime.aggregates.serializable.std > because it > >> > > > > requires setting a state in the init function that doesn't seem > to > >> > > have a > >> > > > > pattern in the other descriptors. > >> > > > > Also I don't seem to understand the reasons for implementing > each > >> of > >> > > > these > >> > > > > descriptors for the aggregate functions. > >> > > > > > >> > > > > On 17 July 2017 at 16:56, Riyafa Abdul Hameed < > >> > [email protected] > >> > > > > >> > > > > wrote: > >> > > > > > >> > > > >> Hi all, > >> > > > >> > >> > > > >> I meant any explanation on the implementation of aggregate > >> functions > >> > > in > >> > > > >> AsterixDB would be highly appreciated. > >> > > > >> > >> > > > >> Thank you. > >> > > > >> Yours sincerely, > >> > > > >> Riyafa > >> > > > >> > >> > > > >> On 16 July 2017 at 08:01, Riyafa Abdul Hameed < > [email protected]> > >> > > > wrote: > >> > > > >> > >> > > > >>> Dear all, > >> > > > >>> > >> > > > >>> I am trying to create aggregate functions and I see there are > >> more > >> > > than > >> > > > >>> one function descriptors for one single function. > >> > > > >>> For example the function array_count(collection) has the > >> following > >> > > > >>> descriptors: > >> > > > >>> > >> > > > >>> > >> > > > >>> - ScalarCountAggregateDescriptor > >> > > > >>> - SerializableCountAggregateDescriptor > >> > > > >>> - CountAggregateDescriptor > >> > > > >>> > >> > > > >>> I am not sure I understand the difference between each of > this. > >> Can > >> > > you > >> > > > >>> please provide and example or point me to a documentation > entry > >> to > >> > > > learn > >> > > > >>> how to properly implement aggregate functions? > >> > > > >>> > >> > > > >>> The function I am trying to implement is ST_Extent. > >> > > > >>> <https://postgis.net/docs/manual-1.4/ST_Extent.html> > >> > > > >>> > >> > > > >>> Thank you. > >> > > > >>> > >> > > > >>> Yours sincerely, > >> > > > >>> > >> > > > >>> Riyafa > >> > > > >>> > >> > > > >> > >> > > > >> > >> > > > >> > >> > > > >> -- > >> > > > >> Riyafa Abdul Hameed > >> > > > >> Undergraduate, University of Moratuwa > >> > > > >> > >> > > > >> Email: [email protected] > >> > > > >> Website: https://riyafa.wordpress.com/ < > >> > http://riyafa.wordpress.com/> > >> > > > >> <http://facebook.com/riyafa.ahf> <http://lk.linkedin.com/in/ > >> riyafa > >> > > > >> > > > >> <http://twitter.com/Riyafa1> > >> > > > >> > >> > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > > >> > > >> > -- > >> > Riyafa Abdul Hameed > >> > Undergraduate, University of Moratuwa > >> > > >> > Email: [email protected] > >> > Website: https://riyafa.wordpress.com/ <http://riyafa.wordpress.com/> > >> > <http://facebook.com/riyafa.ahf> <http://lk.linkedin.com/in/riyafa> > >> > <http://twitter.com/Riyafa1> > >> > > >> > > > > > > > > -- > > Riyafa Abdul Hameed > > Undergraduate, University of Moratuwa > > > > Email: [email protected] > > Website: https://riyafa.wordpress.com/ <http://riyafa.wordpress.com/> > > <http://facebook.com/riyafa.ahf> <http://lk.linkedin.com/in/riyafa> > > <http://twitter.com/Riyafa1> > -- Riyafa Abdul Hameed Undergraduate, University of Moratuwa Email: [email protected] Website: https://riyafa.wordpress.com/ <http://riyafa.wordpress.com/> <http://facebook.com/riyafa.ahf> <http://lk.linkedin.com/in/riyafa> <http://twitter.com/Riyafa1>
