Excellent! +1 On Fri, Nov 1, 2024 at 9:35 AM Calvin Dani <calvinthomas.d...@gmail.com> wrote:
> Hi, > > Thank you for the feedback and as per last meeting here our the changes > that are incorporated to this APE. > They are as follows: > 1. Name of the schema inference functions > 2. Schema inference functionality > > The summary of changes are as follows : > > 1. query_schema (Aggregate function that takes all records of the > subquery and generates a JSON Schema), > 2. collection_schema (JSON Schema translation of the defined datatypes > in the metadata node) > 3. current_schema (for columnar stores and converting the inferred > schema for storage compaction to JSON Schema) > > > Regards > Calvin Dani > > > On Fri, Oct 4, 2024 at 10:28 AM Mike Carey <dtab...@gmail.com> wrote: > > > Great feature! I wasn't able to understand the query example(s), > > though... Could those be cleaned up a little and clarified? > > > > Also, I think we might want two functions at the user level - one that > > takes an expression as input and reports its schema, and another that > > takes a dataset/collection name as input and reports its schema. The > > first one would scan the results and say what the schema is; the other > > would use a more efficient approach (accessing and combining the > > metadata from the collection's most recent LSM components in each of its > > partitions). > > > > Cheers, > > > > Mike > > > > On 10/4/24 10:13 AM, Calvin Dani wrote: > > > Initiating the discussion thread proposing a new aggregate function in > > > AsterixDB. > > > *Feature:* aggregate function to infer schema > > > *Details:* This feature introduces schema inference as an SQL++ > function > > > directly integrated into AsterixDB. It is the first approach to offer > > > schema inference as a native SQL++ function, allowing users to infer > > > schemas for not only any dataset but also for queries and subqueries. > Its > > > output in JSON Schema, the industry standard, produces both human and > > > machine-readable results, suitable for user interpretation or > integration > > > into other queries or programs. > > > > > > Utilizing the template of array_avg() in the Built-in Function and > > Function > > > collection file the array_schema() was implemented. During self > review, a > > > lot of defined aggregate functions for > > > example SerializableAvgAggregateFunction > > > and IntermediateAvgAggregateFunction are not being utilised during > > > array_schema() query. Is it due to different use cases or am I > utilising > > it > > > incorrectly? > > > > > > Are there any resources to understand the functionality of aggregate > > > functions in the implementation? > > > > > > *APE* > > > > > > https://cwiki.apache.org/confluence/display/ASTERIXDB/APE+8%3A+Schema+Inference+Aggregate+Functions > > > >