Hi,

So if I am to check or debug the use of each of the different descriptors
for a function I am to use a clustered setup. Am I right? I am not sure how
to debug in a clustered environment. This is to check how array_avg()
function works and see if understand better.

Further I see that the array_avg function seem to translate into sql-avg
function? I am not sure where that happens.

Sorry about all the silly questions.

On 25 July 2017 at 06:54, Preston Carman <[email protected]> wrote:

> When dealing with aggregates and query plans, I find it helpful to
> think about how the aggregate will work in a distributed environment.
> AsterixDB compiler will make optimizations based on the types of data
> partitioning. If the data is unpartitioned then a single aggregate
> operator and function can calculate the result. If the data is
> partitioned, then sending all the data must be send to a single node
> for processing, which is not very efficient. The aggregate process
> could be split up into two steps. AsterixDB optimizes the query by
> running a process on each partition locally and then sending an
> intermediate result to a single node to create the final aggregate
> result.
>
> COUNT
> In the case of count, the local process is COUNT, but the global
> aggregate process is SUM. We do not want to count responses, but sum
> the total local count values.
>
> AVG
> In the count case, we use a complete separate aggregate function for
> the global step. Consider AVG, to compute the average you need to know
> the count and sum. In this case the local functions find both the
> count and sum. These values are then passed to a global aggregate
> function which uses these local results to calculate the average
> aggregate result.
>
> Take a look at the query plans for a COUNT and AVG query. The
> optimized query plan will show you the two aggregate operators.
>
> As you look at the code, AVG would probably be more informative about
> the full aggregation workflow.
>
>
> On Mon, Jul 24, 2017 at 8:28 AM, Riyafa Abdul Hameed
> <[email protected]> wrote:
> > On 23 July 2017 at 22:59, Yingyi Bu <[email protected]> wrote:
> >
> >> >> I see AVG, LOCAL_AVG, INTERMEDIATE_AVG and GLOBAL_AVG.
> >>
> >> AVG:  that's the local function in the local plan.
> >> LOCAL_AVG, INTERMEDIATE_AVG and GLOBAL_AVG:   think about distributed
> >> computation of average.  LOCAL_AVG aggregates the sum/count at the local
> >> data source, INTERMEDIATE_AVG aggregates the sum/count over partially
> >> aggregated sums/counts, and GLOBAL_AVG computes the final average value
> >> from intermediate sums/counts.
> >>
> >
> > How do we decide if we need these descriptors? COUNT seems to have only
> > one descriptor
> >
> >
> >>
> >> Best,
> >> Yingyi
> >>
> >>
> >> On Sat, Jul 22, 2017 at 9:43 PM, Riyafa Abdul Hameed <
> >> [email protected]> wrote:
> >>
> >> > Hi,
> >> >
> >> > Thanks for the explanation.
> >> > But there are so many things I still don't understand. One of them is
> for
> >> > the avg function itself there are several FuntionIdentifiers. What do
> >> they
> >> > all mean?
> >> >
> >> > I see AVG, LOCAL_AVG, INTERMEDIATE_AVG and GLOBAL_AVG.
> >> >
> >> > What do they all mean?
> >> > Please help
> >> >
> >> > On 19 July 2017 at 21:56, Yingyi Bu <[email protected]> wrote:
> >> >
> >> > > Hi Riyafa,
> >> > >
> >> > >    >> ScalarCountAggregateDescriptor
> >> > >   It's used for counting a scalar array that appears inside a tuple.
> >> > >   For example:
> >> > >   SELECT u.id, array_count(u.friends)
> >> > >   FROM users u;
> >> > >
> >> > >    >> SerializableCountAggregateDescriptor
> >> > >    Serialized aggregation descriptor implementations are only used
> in
> >> > > hash-based group-by.
> >> > >    For example:
> >> > >    SELECT u.city, count(*)
> >> > >    FROM users u
> >> > >    /*+ hash */
> >> > >    GROUP BY u.city;
> >> > >
> >> > >   If your aggregation function doesn't have a fixed-byte-sized
> state,
> >> you
> >> > > don't need to worry about that or implement that.
> >> > >
> >> > >    >> CountAggregateDescriptor
> >> > >    This is used in group-by or global aggregate:
> >> > >    For example:
> >> > >    SELECT u.city, count(*)
> >> > >    FROM users u
> >> > >    GROUP BY u.city;
> >> > >
> >> > >    SELECT count(*) FROM users;
> >> > >
> >> > >
> >> > > Best,
> >> > > Yingyi
> >> > >
> >> > >
> >> > > On Wed, Jul 19, 2017 at 7:55 AM, Riyafa Abdul Hameed <
> >> [email protected]>
> >> > > wrote:
> >> > >
> >> > > > Hi again,
> >> > > >
> >> > > > Any suggestions on this? Or anyone I can reach to who are not on
> this
> >> > > list
> >> > > > or not active on the list?
> >> > > >
> >> > > > Thank you.
> >> > > >
> >> > > > On 17 July 2017 at 17:18, Riyafa Abdul Hameed <[email protected]>
> >> > wrote:
> >> > > >
> >> > > > > Hi again,
> >> > > > >
> >> > > > > I think I can understand how to write the descriptor in the
> >> packages:
> >> > > > > org.apache.asterix.runtime.aggregates.std and
> >> > > > org.apache.asterix.runtime.aggregates.scalar.
> >> > > > > But I am not sure I understand how to write the descriptor in
> the
> >> > > > package:
> >> > > > > org.apache.asterix.runtime.aggregates.serializable.std
> because it
> >> > > > > requires setting a state in the init function that doesn't seem
> to
> >> > > have a
> >> > > > > pattern in the other descriptors.
> >> > > > > Also I don't seem to understand the reasons for implementing
> each
> >> of
> >> > > > these
> >> > > > > descriptors for the aggregate functions.
> >> > > > >
> >> > > > > On 17 July 2017 at 16:56, Riyafa Abdul Hameed <
> >> > [email protected]
> >> > > >
> >> > > > > wrote:
> >> > > > >
> >> > > > >> Hi all,
> >> > > > >>
> >> > > > >> I meant any explanation on the implementation of aggregate
> >> functions
> >> > > in
> >> > > > >> AsterixDB would be highly appreciated.
> >> > > > >>
> >> > > > >> Thank you.
> >> > > > >> Yours sincerely,
> >> > > > >> Riyafa
> >> > > > >>
> >> > > > >> On 16 July 2017 at 08:01, Riyafa Abdul Hameed <
> [email protected]>
> >> > > > wrote:
> >> > > > >>
> >> > > > >>> Dear all,
> >> > > > >>>
> >> > > > >>> I am trying to create aggregate functions and I see there are
> >> more
> >> > > than
> >> > > > >>> one function descriptors for one single function.
> >> > > > >>> For example the function array_count(collection) has the
> >> following
> >> > > > >>> descriptors:
> >> > > > >>>
> >> > > > >>>
> >> > > > >>>    - ScalarCountAggregateDescriptor
> >> > > > >>>    - SerializableCountAggregateDescriptor
> >> > > > >>>    - CountAggregateDescriptor
> >> > > > >>>
> >> > > > >>> I am not sure I understand the difference between each of
> this.
> >> Can
> >> > > you
> >> > > > >>> please provide and example or point me to a documentation
> entry
> >> to
> >> > > > learn
> >> > > > >>> how to properly implement aggregate functions?
> >> > > > >>>
> >> > > > >>> The function I am trying to implement is ST_Extent.
> >> > > > >>> <https://postgis.net/docs/manual-1.4/ST_Extent.html>
> >> > > > >>>
> >> > > > >>> Thank you.
> >> > > > >>>
> >> > > > >>> Yours sincerely,
> >> > > > >>>
> >> > > > >>> Riyafa
> >> > > > >>>
> >> > > > >>
> >> > > > >>
> >> > > > >>
> >> > > > >> --
> >> > > > >> Riyafa Abdul Hameed
> >> > > > >> Undergraduate, University of Moratuwa
> >> > > > >>
> >> > > > >> Email: [email protected]
> >> > > > >> Website: https://riyafa.wordpress.com/ <
> >> > http://riyafa.wordpress.com/>
> >> > > > >> <http://facebook.com/riyafa.ahf>  <http://lk.linkedin.com/in/
> >> riyafa
> >> > >
> >> > > > >> <http://twitter.com/Riyafa1>
> >> > > > >>
> >> > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >> >
> >> >
> >> > --
> >> > Riyafa Abdul Hameed
> >> > Undergraduate, University of Moratuwa
> >> >
> >> > Email: [email protected]
> >> > Website: https://riyafa.wordpress.com/ <http://riyafa.wordpress.com/>
> >> > <http://facebook.com/riyafa.ahf>  <http://lk.linkedin.com/in/riyafa>
> >> > <http://twitter.com/Riyafa1>
> >> >
> >>
> >
> >
> >
> > --
> > Riyafa Abdul Hameed
> > Undergraduate, University of Moratuwa
> >
> > Email: [email protected]
> > Website: https://riyafa.wordpress.com/ <http://riyafa.wordpress.com/>
> > <http://facebook.com/riyafa.ahf>  <http://lk.linkedin.com/in/riyafa>
> > <http://twitter.com/Riyafa1>
>



-- 
Riyafa Abdul Hameed
Undergraduate, University of Moratuwa

Email: [email protected]
Website: https://riyafa.wordpress.com/ <http://riyafa.wordpress.com/>
<http://facebook.com/riyafa.ahf>  <http://lk.linkedin.com/in/riyafa>
<http://twitter.com/Riyafa1>

Reply via email to