okumin commented on code in PR #5541: URL: https://github.com/apache/hive/pull/5541#discussion_r1881689498
########## parser/src/java/org/apache/hadoop/hive/ql/parse/HiveParser.g: ########## @@ -1840,6 +1841,14 @@ tableImplBuckets -> ^(TOK_ALTERTABLE_BUCKETS $num) ; +tableWriteOrdered +@init { pushMsg("table sorted specification", state); } +@after { popMsg(state); } + : + KW_WRITE KW_ORDERED KW_BY sortCols=columnNameOrderList Review Comment: I give +1 to WRITE LOCALLY ORDERED BY. I double-checked `WRITE ORDERED BY` vs `WRITE LOCALLY ORDERED BY` with Spark 3.5.1 and Iceberg 1.6.1. ``` spark-sql (default)> CREATE TABLE hadoop_prod.default.test2 (a int) USING iceberg; Time taken: 0.089 seconds spark-sql (default)> ALTER TABLE hadoop_prod.default.test2 WRITE ORDERED BY a; Time taken: 0.182 seconds spark-sql (default)> CREATE TABLE hadoop_prod.default.test3 (a int) USING iceberg; Time taken: 0.086 seconds spark-sql (default)> ALTER TABLE hadoop_prod.default.test3 WRITE LOCALLY ORDERED BY a; ``` This is the diff. ``` zookage@client-node-0:~$ hdfs dfs -cat /user/hive/warehouse/catalog/default/test2/metadata/v2.metadata.json > /tmp/test2.json zookage@client-node-0:~$ hdfs dfs -cat /user/hive/warehouse/catalog/default/test3/metadata/v2.metadata.json > /tmp/test3.json zookage@client-node-0:~$ diff /tmp/test2.json /tmp/test3.json 3,4c3,4 < "table-uuid" : "821c3cc2-1320-45dc-bb2a-c805778caa91", < "location" : "hdfs://hdfs-namenode-0.hdfs-namenode:8020/user/hive/warehouse/catalog/default/test2", --- > "table-uuid" : "094951f1-6229-43cd-bffe-cb2f086b8dda", > "location" : "hdfs://hdfs-namenode-0.hdfs-namenode:8020/user/hive/warehouse/catalog/default/test3", 6c6 < "last-updated-ms" : 1733994463428, --- > "last-updated-ms" : 1733994491871, 40c40 < "write.distribution-mode" : "range", --- > "write.distribution-mode" : "none", 50,51c50,51 < "timestamp-ms" : 1733994460005, < "metadata-file" : "hdfs://hdfs-namenode-0.hdfs-namenode:8020/user/hive/warehouse/catalog/default/test2/metadata/v1.metadata.json" --- > "timestamp-ms" : 1733994472725, > "metadata-file" : "hdfs://hdfs-namenode-0.hdfs-namenode:8020/user/hive/warehouse/catalog/default/test3/metadata/v1.metadata.json" ``` Looks like, the meaningful difference is only `write.distribution-mode=range` or `write.distribution-mode=none`. I guess adding LOCALLY makes more sense unless we give `range`. I am not a specialist of Apache Spark. Please feel free to correct me if I am wrong. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org