zratkai commented on code in PR #5541:
URL: https://github.com/apache/hive/pull/5541#discussion_r1875869433


##########
parser/src/java/org/apache/hadoop/hive/ql/parse/HiveParser.g:
##########
@@ -1840,6 +1841,14 @@ tableImplBuckets
     -> ^(TOK_ALTERTABLE_BUCKETS $num)
     ;
 
+tableWriteOrdered
+@init { pushMsg("table sorted specification", state); }
+@after { popMsg(state); }
+    :
+      KW_WRITE KW_ORDERED KW_BY sortCols=columnNameOrderList

Review Comment:
   @zhangbutao , @okumin , @deniskuzZ 
   First of all, thanks for your comments! This syntax comes from the official 
Iceberg recommendation, which is for alter table, and to be consistend I 
modified the ALTER to CREATE in the syntax, so at CREATE table it is the same 
as ALTER TABLE:
   
[https://iceberg.apache.org/docs/1.7.0/spark-ddl/#alter-table-write-ordered-by]
   
   My plan is to work as a next step on ALTER TABLE after this PR is done. I 
already created the ticket for that at the time this ticket was born: 
   [https://issues.apache.org/jira/browse/HIVE-28587]
   
   About table property idea and comparisms: 
   The iceberg syntax defines multiple keywords in the above page, not just the 
colum names:
   
   "ALTER TABLE prod.db.sample WRITE ORDERED BY category ASC NULLS LAST, id 
DESC NULLS FIRST"
   
   The mentioned examples do not cover all of that. E.g. I could not find in 
the linked Trino documentation how the ordering is defined (ASC/DESC) or the 
null ordering (NULLS FIRST/LAST). With that the table property solution would 
look like ugly and hard to remember and write correctly for the user. 
   
   Since it is passedas a table property deep in the code here, it works with 
table property as well with this commit :
   e.g.:
   create table ice_orc_sorted (id int, text string) stored by iceberg stored 
as orc 
TBLPROPERTIES('iceberg.write-order'='{"order-id":1,"fields":[{"transform":"identity","source-id":1,"direction":"asc","null-order":"nulls-last"}]}');
   
   The syntax comes from Iceberg built in format for sorted order, but I do not 
find it user friendly and I do not recommend to go that way.
   
   
   About SORTED BY/ SORT BY/LOCALSORT BY: as you already mentioned it would be 
confusing, and the "ALTER/CREATE TABLE prod.db.sample WRITE ORDERED BY category 
ASC NULLS LAST, id DESC NULLS FIRST" is recommended by Iceberg, and easy to 
understand and remember.
   
   So every DB engine has it's own syntax for this purpose and **neither is 
following Iceberg recommendation except this one.**
   
   As far as I found Spark does not even support yet sort order at alter table:
   
https://spark.apache.org/docs/3.5.3/sql-ref-syntax-ddl-alter-table.html#content
   https://spark.apache.org/docs/3.5.3/sql-ref-syntax-ddl-create-table.html
   
   
   At the linked iceberg discussion even Ryan Blue said this:
   "No, I don't think so. These aren't table properties... "
   And someone else said:
   " I don't think using TBLPROPERTIES is a good idea..."
   
   I hope I answered all the questions!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org

Reply via email to