Re: Schema Aggregate Function

Calvin Dani Wed, 06 Nov 2024 16:06:51 -0800

Hi,

The APE has been updated with those changes!


Regards
Calvin Dani

On Fri, Nov 1, 2024 at 10:36 AM Mike Carey <[email protected]> wrote:

> Excellent!  +1
>
> On Fri, Nov 1, 2024 at 9:35 AM Calvin Dani <[email protected]>
> wrote:
>
> > Hi,
> >
> > Thank you for the feedback and as per last meeting here our the changes
> > that are incorporated to this APE.
> > They are as follows:
> > 1.  Name of the schema inference functions
> > 2. Schema inference functionality
> >
> > The summary of changes are as follows :
> >
> >    1. query_schema (Aggregate function that takes all records of the
> >    subquery and generates a JSON Schema),
> >    2. collection_schema (JSON Schema translation of the defined datatypes
> >    in the metadata node)
> >    3. current_schema (for columnar stores and converting the inferred
> >    schema for storage compaction to JSON Schema)
> >
> >
> > Regards
> > Calvin Dani
> >
> >
> > On Fri, Oct 4, 2024 at 10:28 AM Mike Carey <[email protected]> wrote:
> >
> > > Great feature!  I wasn't able to understand the query example(s),
> > > though...  Could those be cleaned up a little and clarified?
> > >
> > > Also, I think we might want two functions at the user level - one that
> > > takes an expression as input and reports its schema, and another that
> > > takes a dataset/collection name as input and reports its schema.  The
> > > first one would scan the results and say what the schema is; the other
> > > would use a more efficient approach (accessing and combining the
> > > metadata from the collection's most recent LSM components in each of
> its
> > > partitions).
> > >
> > > Cheers,
> > >
> > > Mike
> > >
> > > On 10/4/24 10:13 AM, Calvin Dani wrote:
> > > > Initiating the discussion thread proposing a new aggregate function
> in
> > > > AsterixDB.
> > > > *Feature:* aggregate function to infer schema
> > > > *Details:* This feature introduces schema inference as an SQL++
> > function
> > > > directly integrated into AsterixDB. It is the first approach to offer
> > > > schema inference as a native SQL++ function, allowing users to infer
> > > > schemas for not only any dataset but also for queries and subqueries.
> > Its
> > > > output in JSON Schema, the industry standard, produces both human and
> > > > machine-readable results, suitable for user interpretation or
> > integration
> > > > into other queries or programs.
> > > >
> > > > Utilizing the template of array_avg() in the Built-in Function and
> > > Function
> > > > collection file the array_schema() was implemented. During self
> > review, a
> > > > lot of defined aggregate functions for
> > > > example SerializableAvgAggregateFunction
> > > > and IntermediateAvgAggregateFunction are not being utilised during
> > > > array_schema() query. Is it due to different use cases or am I
> > utilising
> > > it
> > > > incorrectly?
> > > >
> > > > Are there any resources to understand the functionality of aggregate
> > > > functions in the implementation?
> > > >
> > > > *APE*
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/ASTERIXDB/APE+8%3A+Schema+Inference+Aggregate+Functions
> > > >
> >
>

Re: Schema Aggregate Function

Reply via email to