Hi, Wanted to share an update regarding the features in the APE. The two queries:
1. query_schema() 2. collection_schema() are now functional. The query_schema() implementation has been submitted for review. Once that is approved, I will proceed to submit the collection_schema() query, as it depends on the first query's code. I would greatly appreciate your feedback, additional test cases, and any thoughts you have on this APE. I’m eager to refine it further or, if it seems like a solid starting point, to receive approval for this APE. Thank you for your time and input! Regards Calvin Dani On Wed, Nov 6, 2024 at 4:06 PM Calvin Dani <calvinthomas.d...@gmail.com> wrote: > Hi, > > The APE has been updated with those changes! > > Regards > Calvin Dani > > On Fri, Nov 1, 2024 at 10:36 AM Mike Carey <dtab...@gmail.com> wrote: > >> Excellent! +1 >> >> On Fri, Nov 1, 2024 at 9:35 AM Calvin Dani <calvinthomas.d...@gmail.com> >> wrote: >> >> > Hi, >> > >> > Thank you for the feedback and as per last meeting here our the changes >> > that are incorporated to this APE. >> > They are as follows: >> > 1. Name of the schema inference functions >> > 2. Schema inference functionality >> > >> > The summary of changes are as follows : >> > >> > 1. query_schema (Aggregate function that takes all records of the >> > subquery and generates a JSON Schema), >> > 2. collection_schema (JSON Schema translation of the defined >> datatypes >> > in the metadata node) >> > 3. current_schema (for columnar stores and converting the inferred >> > schema for storage compaction to JSON Schema) >> > >> > >> > Regards >> > Calvin Dani >> > >> > >> > On Fri, Oct 4, 2024 at 10:28 AM Mike Carey <dtab...@gmail.com> wrote: >> > >> > > Great feature! I wasn't able to understand the query example(s), >> > > though... Could those be cleaned up a little and clarified? >> > > >> > > Also, I think we might want two functions at the user level - one that >> > > takes an expression as input and reports its schema, and another that >> > > takes a dataset/collection name as input and reports its schema. The >> > > first one would scan the results and say what the schema is; the other >> > > would use a more efficient approach (accessing and combining the >> > > metadata from the collection's most recent LSM components in each of >> its >> > > partitions). >> > > >> > > Cheers, >> > > >> > > Mike >> > > >> > > On 10/4/24 10:13 AM, Calvin Dani wrote: >> > > > Initiating the discussion thread proposing a new aggregate function >> in >> > > > AsterixDB. >> > > > *Feature:* aggregate function to infer schema >> > > > *Details:* This feature introduces schema inference as an SQL++ >> > function >> > > > directly integrated into AsterixDB. It is the first approach to >> offer >> > > > schema inference as a native SQL++ function, allowing users to infer >> > > > schemas for not only any dataset but also for queries and >> subqueries. >> > Its >> > > > output in JSON Schema, the industry standard, produces both human >> and >> > > > machine-readable results, suitable for user interpretation or >> > integration >> > > > into other queries or programs. >> > > > >> > > > Utilizing the template of array_avg() in the Built-in Function and >> > > Function >> > > > collection file the array_schema() was implemented. During self >> > review, a >> > > > lot of defined aggregate functions for >> > > > example SerializableAvgAggregateFunction >> > > > and IntermediateAvgAggregateFunction are not being utilised during >> > > > array_schema() query. Is it due to different use cases or am I >> > utilising >> > > it >> > > > incorrectly? >> > > > >> > > > Are there any resources to understand the functionality of aggregate >> > > > functions in the implementation? >> > > > >> > > > *APE* >> > > > >> > > >> > >> https://cwiki.apache.org/confluence/display/ASTERIXDB/APE+8%3A+Schema+Inference+Aggregate+Functions >> > > > >> > >> >