Re: Review Request : UPDATE statement support in SQL++ for AsterixDB

Abhishek Jindal Fri, 25 Oct 2024 09:56:08 -0700

I'm dumping my thoughts here; trying to use process of elimination to reach a 
consensus on the final grammar:

* Initially thought about using semi-colon (;) for separating change 
productions.
However, semi-colon should only be used to indicate end-of-statement. This 
leaves us with
Using either comma (',') or closed parenthesis ')' to mark end of change 
production.

* We cannot enforce a rule stating - All change productions need to be enclosed 
within open-close
   parenthesis pair. This will make the UPDATE statement incompatible with 
traditional SQL.
  In traditional SQL a SetClause is not enclosed within parenthesis and we want 
our query parser to accept
  such a statement.

* However, assuming no parenthesis in any change production, the presence of 
comma (,) can lead to ambiguous nature.
Example:

UPDATE sampleAnalytics.Commerce.customers as c
SET c.name = "Abhishek", c.rating = c.rating + 500,
SET c.is_blacklisted = (c.rating < 200)

This statement is very confusing to read - The purpose of 2nd comma in the 
statement is unclear to the user.

* To eliminate this ambiguous nature of a comma, I believe this character 
should be left out for its sole use of separating SetElements
 in a SetClause.

* We also cannot simply start with a new change production without a delimiter. 
This will cause issues with the parser misidentifying
  recursive change productions as a new change. For example:

UPDATE sampleAnalytics.Commerce.orders as o
 UPDATE o.items as item
 SET item.total = item.qty * item.price
WHERE o.orderno = 1006;

If we are not using any delimiter to mark end-of-change, the parser will 
wrongly identify 2 change productions here:
    1. UPDATE o.items as item
    2.  SET item.total = item.qty * item.price

This obviously is incorrect.

* I believe this leaves us with only 1 solution (comments are welcome __ ):
   Always enclose a change-production within parenthesis '(' and ')'.
  This obviously is not valid if the first change in the production list is a 
SET clause - This is for backward compatibility with SQL.

----

+1 to the idea of 'AT INDEX' in INSERT INTO and DELETE FROM nested clauses.

Regards,

Abhishek

On 10/22/24, 3:40 PM, "Mike Carey" <dtab...@gmail.com 
<mailto:dtab...@gmail.com>> wrote:

Thanks! Agreed about the AT issue - we should add an optional INDEX 
qualifier there - probably even in the FROM clause's AT clause option 
(which already exists minus the keywork option).

We should look into adding commas - that seems like a good change if the 
grammar is amenable to that (which I think it would be). Making the 
parentheses optional might not work - not sure - we can explore that - 
I'm not 100% sure we could do that w/o introducing ambiguity about where 
things start and stop. (But if we can I'd love to ditch the parentheses 
- that was a conservative approach to the potential problem that will 
surely work.)

Cheers,

Mike

On 10/20/24 12:19 PM, Glenn Galvizo wrote:
> +1 Very much a needed feature!
>
> - For inserting / modifying items at arrays, i think it might help to have 
> another token after the ‘AT’ to denote that this is a position (it might just 
> be me, but ‘AT 1’ seems a little too vague). Maybe ‘AT INDEX 1’? (given that 
> INDEX is already a reserved word?)
> - The Change production seems like it should be separated with a comma (to 
> really hammer in the point that this is a sequence) or even a semicolon if we 
> want to make this more PL/SQL-like. It could also an opportunity to make the 
> parenthesis optional, if you want to go down that route.
>
> Other than those two minor things, I like it!
>
> Best,
> Glenn
>
>> On Oct 20, 2024, at 10:33, Mike Carey<dtab...@gmail.com 
>> <mailto:dtab...@gmail.com>> wrote:
>>
>> +1 for this (obviously, since I am on it). FYI, we have also run our UPDATE 
>> user model and syntax by Yannis P (father of SQL++) and Don C (father of 
>> SQL) for their input prior to posting this APE. :-) We've needed this 
>> feature for quite some time in order to conveniently express small(-ish) 
>> changes to arbitrary (possibly large) schema-less documents.
>>
>> Discussion welcome!
>>
>> Cheers,
>>
>> Mike
>>
>>> On 10/18/24 3:18 PM, Abhishek Jindal wrote:
>>> Hi All,
>>>
>>> I'm initiating a discussion thread proposing the SQL++ UPDATE statement in 
>>> AsterixDB.
>>> *Feature:* Adding support for SQL++ UPDATE statement.
>>> *Details:* AsterixDB currently does not support UPDATE operations without 
>>> having
>>> to pass an entire new object to replace an existing record in a collection.
>>> The following proposal discusses syntax and semantics of the UPDATE 
>>> statement as part of
>>> SQL++ for AsterixDB.
>>>
>>> We plan to implement this feature by rewriting the UPDATE statement into 
>>> its equivalent
>>> UPSERT form, allowing us to reuse the existing LSM-tree UPSERT machinery to 
>>> handle the transformed incoming record.
>>>
>>> To apply transformations to an incoming record, we employ the following 
>>> approach:
>>>
>>> 1. We recursively traverse the hierarchy of transformations as specified by 
>>> the user in the query.
>>> 2. At each hierarchical level, we rewrite the transformation to the 
>>> equivalent record-merge() built-in function.
>>> 3. These rewritten record-merge() transformations are then combined in a 
>>> bottom-up manner, finally producing the final transformation function.
>>>
>>> APE 
>>> :https://cwiki.apache.org/confluence/display/ASTERIXDB/APE+9%3A+UPDATE+Statement
>>>  
>>> <https://cwiki.apache.org/confluence/display/ASTERIXDB/APE+9%3A+UPDATE+Statement>

Re: Review Request : UPDATE statement support in SQL++ for AsterixDB

Reply via email to