Hi,

Wanted to share an update regarding the features in the APE. The two
queries:

1. query_schema()

2. collection_schema()

are now functional. The query_schema() implementation has been submitted
for review. Once that is approved, I will proceed to submit the
collection_schema() query, as it depends on the first query's code.

I would greatly appreciate your feedback, additional test cases, and any
thoughts you have on this APE. I’m eager to refine it further or, if it
seems like a solid starting point, to receive approval for this APE.

Thank you for your time and input!

Regards

Calvin Dani

On Wed, Nov 6, 2024 at 4:06 PM Calvin Dani <calvinthomas.d...@gmail.com>
wrote:

> Hi,
>
> The APE has been updated with those changes!
>
> Regards
> Calvin Dani
>
> On Fri, Nov 1, 2024 at 10:36 AM Mike Carey <dtab...@gmail.com> wrote:
>
>> Excellent!  +1
>>
>> On Fri, Nov 1, 2024 at 9:35 AM Calvin Dani <calvinthomas.d...@gmail.com>
>> wrote:
>>
>> > Hi,
>> >
>> > Thank you for the feedback and as per last meeting here our the changes
>> > that are incorporated to this APE.
>> > They are as follows:
>> > 1.  Name of the schema inference functions
>> > 2. Schema inference functionality
>> >
>> > The summary of changes are as follows :
>> >
>> >    1. query_schema (Aggregate function that takes all records of the
>> >    subquery and generates a JSON Schema),
>> >    2. collection_schema (JSON Schema translation of the defined
>> datatypes
>> >    in the metadata node)
>> >    3. current_schema (for columnar stores and converting the inferred
>> >    schema for storage compaction to JSON Schema)
>> >
>> >
>> > Regards
>> > Calvin Dani
>> >
>> >
>> > On Fri, Oct 4, 2024 at 10:28 AM Mike Carey <dtab...@gmail.com> wrote:
>> >
>> > > Great feature!  I wasn't able to understand the query example(s),
>> > > though...  Could those be cleaned up a little and clarified?
>> > >
>> > > Also, I think we might want two functions at the user level - one that
>> > > takes an expression as input and reports its schema, and another that
>> > > takes a dataset/collection name as input and reports its schema.  The
>> > > first one would scan the results and say what the schema is; the other
>> > > would use a more efficient approach (accessing and combining the
>> > > metadata from the collection's most recent LSM components in each of
>> its
>> > > partitions).
>> > >
>> > > Cheers,
>> > >
>> > > Mike
>> > >
>> > > On 10/4/24 10:13 AM, Calvin Dani wrote:
>> > > > Initiating the discussion thread proposing a new aggregate function
>> in
>> > > > AsterixDB.
>> > > > *Feature:* aggregate function to infer schema
>> > > > *Details:* This feature introduces schema inference as an SQL++
>> > function
>> > > > directly integrated into AsterixDB. It is the first approach to
>> offer
>> > > > schema inference as a native SQL++ function, allowing users to infer
>> > > > schemas for not only any dataset but also for queries and
>> subqueries.
>> > Its
>> > > > output in JSON Schema, the industry standard, produces both human
>> and
>> > > > machine-readable results, suitable for user interpretation or
>> > integration
>> > > > into other queries or programs.
>> > > >
>> > > > Utilizing the template of array_avg() in the Built-in Function and
>> > > Function
>> > > > collection file the array_schema() was implemented. During self
>> > review, a
>> > > > lot of defined aggregate functions for
>> > > > example SerializableAvgAggregateFunction
>> > > > and IntermediateAvgAggregateFunction are not being utilised during
>> > > > array_schema() query. Is it due to different use cases or am I
>> > utilising
>> > > it
>> > > > incorrectly?
>> > > >
>> > > > Are there any resources to understand the functionality of aggregate
>> > > > functions in the implementation?
>> > > >
>> > > > *APE*
>> > > >
>> > >
>> >
>> https://cwiki.apache.org/confluence/display/ASTERIXDB/APE+8%3A+Schema+Inference+Aggregate+Functions
>> > > >
>> >
>>
>

Reply via email to