Qiheng He created HIVE-28316:
--------------------------------
Summary: The documentation provides an ambiguous explanation
regarding the mutually exclusive nature of `STORED BY` and `STORED AS`
Key: HIVE-28316
URL: https://issues.apache.org/jira/browse/HIVE-28316
Project: Hive
Issue Type: Bug
Reporter: Qiheng He
- The documentation provides an ambiguous explanation regarding the mutually
exclusive nature of {*}STORED BY{*} and {*}STORED AS{*}.
- As mentioned on
https://cwiki.apache.org/confluence/display/Hive/StorageHandlers , when the
{*}CREATE TABLE{*} statement specifies {*}STORED BY{*}, it should not also
specify {*}STORED AS{*}. The content in question is as follows.
{code:bash}
When STORED BY is specified, then row_format (DELIMITED or SERDE) and STORED AS
cannot be specified. Optional SERDEPROPERTIES can be specified as part of the
STORED BY clause and will be passed to the serde provided by the storage
handler.
See CREATE TABLE and Row Format, Storage Format, and SerDe for more information.
Example:
CREATE TABLE hbase_table_1(key int, value string)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES (
"hbase.columns.mapping" = "cf:string",
"hbase.table.name" = "hbase_table_0"
);
{code}
- This is similarly reflected in the documentation at
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL , where
{*}|{*} separates {*}STORED BY{*} from {*}STORED AS{*}, indicating their
distinct usage and mutual exclusivity.
{code:bash}
[
[ROW FORMAT row_format]
[STORED AS file_format]
| STORED BY 'storage.handler.class.name' [WITH SERDEPROPERTIES (...)] --
(Note: Available in Hive 0.6.0 and later)
]
{code}
- However, this contradicts the information provided in the Hive-Iceberg
Integration documentation at
https://cwiki.apache.org/confluence/display/Hive/Hive-Iceberg+Integration ,
which explicitly gives examples demonstrating that {*}STORED BY{*} can coexist
with {*}STORED AS{*}. This creates an ambiguous interpretation.
{code:bash}
The iceberg table currently supports three file formats: PARQUET, ORC & AVRO.
The default file format is Parquet. The file format can be explicitily provided
by using STORED AS <Format> while creating the table
Example-1:
CREATE TABLE ORC_TABLE (ID INT) STORED BY ICEBERG STORED AS ORC;
{code}
- Further early discussions on this topic can be found at
https://github.com/apache/shardingsphere/pull/31526 .
--
This message was sent by Atlassian Jira
(v8.20.10#820010)