[GitHub] [hudi] pengzhiwei2018 edited a comment on pull request #2645: [HUDI-1659] Basic Implementation Of Spark Sql Support

GitBox Thu, 22 Apr 2021 07:51:39 -0700


pengzhiwei2018 edited a comment on pull request #2645:
URL: https://github.com/apache/hudi/pull/2645#issuecomment-824759341



   > @pengzhiwei2018 can we file followups from this review as sub tasks under 
the same umbrella JIRA?
   > 
   > I spent sometime looking at snowflake and bigquery and what kind of 
experience users have there writing data out.
   > Here are my recommendations (mostly borrowing from ANSI SQL)
   > 
   > * [x]  We can support `PRIMARY KEY(col1, col2,..)` definition, if no PK is 
specified we will generate a synthetic key or have it be null.
   > * [ ]  Multi table inserts. `INSERT ALL WHEN condition1 INTO t1 WHEN 
condition2 into t2`
   > * [x]  Update statement `UPDATE t1 SET t1.a = t2.b + 1  FROM t2 WHERE 
condition`
   > * [x]  Merge into statement with matched and not matched clauses.
   > * [x]  Delete from statement
   > * [ ]  Copy INTO statement that integrates with Hudi bootstrap 
functionality
   > * [ ]  CREATE table with support for unique constraint check.
   > * [ ]  ALTER table statement to alter schema constraints.
   > * [ ]  CREATE table with `CLUSTER BY(col1, col2)`
   > * [ ]  CREATE INDEX for adding indexes (future, as we complete RFC-08,27)
   > * [ ]  CREATE table with `FOREIGN KEY`, `DATABASE, SCHEMA` (future plans, 
needs multi table txns + our metaserver)
   > * [ ]  Expose all Hudi table services (cleaning, compaction, clustering, 
.. ) using a `CALL cleaner <arg1, arg2, ....>` kind of syntax. Over time we can 
expose more standard functions there.  For e.g more advanced compaction and 
clustering strategies call be specified there. We may need a `SHOW services t1` 
to show information for these scheduled calls.
   > 
   > Checked off items I think are already covered in this PR. If not, please 
raise JIRA subtasks for these as well.
   
   That is greate!  I will file a JIRA for each of those have not covered in 
this PR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] pengzhiwei2018 edited a comment on pull request #2645: [HUDI-1659] Basic Implementation Of Spark Sql Support

Reply via email to