Hi, The APE has been updated with those changes!
Regards Calvin Dani On Fri, Nov 1, 2024 at 10:36 AM Mike Carey <dtab...@gmail.com> wrote: > Excellent! +1 > > On Fri, Nov 1, 2024 at 9:35 AM Calvin Dani <calvinthomas.d...@gmail.com> > wrote: > > > Hi, > > > > Thank you for the feedback and as per last meeting here our the changes > > that are incorporated to this APE. > > They are as follows: > > 1. Name of the schema inference functions > > 2. Schema inference functionality > > > > The summary of changes are as follows : > > > > 1. query_schema (Aggregate function that takes all records of the > > subquery and generates a JSON Schema), > > 2. collection_schema (JSON Schema translation of the defined datatypes > > in the metadata node) > > 3. current_schema (for columnar stores and converting the inferred > > schema for storage compaction to JSON Schema) > > > > > > Regards > > Calvin Dani > > > > > > On Fri, Oct 4, 2024 at 10:28 AM Mike Carey <dtab...@gmail.com> wrote: > > > > > Great feature! I wasn't able to understand the query example(s), > > > though... Could those be cleaned up a little and clarified? > > > > > > Also, I think we might want two functions at the user level - one that > > > takes an expression as input and reports its schema, and another that > > > takes a dataset/collection name as input and reports its schema. The > > > first one would scan the results and say what the schema is; the other > > > would use a more efficient approach (accessing and combining the > > > metadata from the collection's most recent LSM components in each of > its > > > partitions). > > > > > > Cheers, > > > > > > Mike > > > > > > On 10/4/24 10:13 AM, Calvin Dani wrote: > > > > Initiating the discussion thread proposing a new aggregate function > in > > > > AsterixDB. > > > > *Feature:* aggregate function to infer schema > > > > *Details:* This feature introduces schema inference as an SQL++ > > function > > > > directly integrated into AsterixDB. It is the first approach to offer > > > > schema inference as a native SQL++ function, allowing users to infer > > > > schemas for not only any dataset but also for queries and subqueries. > > Its > > > > output in JSON Schema, the industry standard, produces both human and > > > > machine-readable results, suitable for user interpretation or > > integration > > > > into other queries or programs. > > > > > > > > Utilizing the template of array_avg() in the Built-in Function and > > > Function > > > > collection file the array_schema() was implemented. During self > > review, a > > > > lot of defined aggregate functions for > > > > example SerializableAvgAggregateFunction > > > > and IntermediateAvgAggregateFunction are not being utilised during > > > > array_schema() query. Is it due to different use cases or am I > > utilising > > > it > > > > incorrectly? > > > > > > > > Are there any resources to understand the functionality of aggregate > > > > functions in the implementation? > > > > > > > > *APE* > > > > > > > > > > https://cwiki.apache.org/confluence/display/ASTERIXDB/APE+8%3A+Schema+Inference+Aggregate+Functions > > > > > > >