[
https://issues.apache.org/jira/browse/HIVE-11996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sameer Gupta updated HIVE-11996:
--------------------------------
Description:
ERROR CODE and ERROR TEXT:
" LINES TERMINATED BY only supports newline '\n' right now. Error
encountered near token ''\u0001'' (state=42000,code=40000)"
ISSUE DISCRIPTION:
Hive Language Manual States that Changing the Line Delimeter is Possible.
row_format
: DELIMITED [FIELDS TERMINATED BY char [ESCAPED BY char]] [COLLECTION ITEMS
TERMINATED BY char]
[MAP KEYS TERMINATED BY char] [LINES TERMINATED BY char]
[NULL DEFINED AS char] -- (Note: Available in Hive 0.13 and later)
| SERDE serde_name [WITH SERDEPROPERTIES (property_name=property_value,
property_name=property_value, ...)]
Ref:
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-Create/Drop/TruncateTable
But on defining the [LINES TERMINATED BY char], an error stating hive only
supports newline '\n' right now is encountered. Whcih essentially means that
the choice of new line character is static. Why does this come as a a
configurable item in the DDL is unclear.
This limitation seems to be hardcoded here:
https://github.com/apache/hive/blob/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java#L171
IMPACT:
While storing freform data such as Email or Comments, it is fairly common to
have a '\n' character crop up. A lot of free form ETL on Linux using majority
of ETL tools also adds a $ (new line character) to maintain formating.
As the Hive Language manual shows this as a configurable property, it also
leads to misleading solution designs which fail when the create statement is
triggered in the development phase.
having the ability to choose your row delimiter is a very basic necessacity and
it is alarming the this is not supported till Hive 14 to the best of mu
knowledge.
was:
Error Code and Error Text:
" LINES TERMINATED BY only supports newline '\n' right now. Error
encountered near token ''\u0001'' (state=42000,code=40000)"
ISSUE DISCRIPTION:
Hive Language Manual States that Changing the Line Delimeter is Possible.
row_format
: DELIMITED [FIELDS TERMINATED BY char [ESCAPED BY char]] [COLLECTION ITEMS
TERMINATED BY char]
[MAP KEYS TERMINATED BY char] [LINES TERMINATED BY char]
[NULL DEFINED AS char] -- (Note: Available in Hive 0.13 and later)
| SERDE serde_name [WITH SERDEPROPERTIES (property_name=property_value,
property_name=property_value, ...)]
Ref:
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-Create/Drop/TruncateTable
But on defining the [LINES TERMINATED BY char], an error stating hive only
supports newline '\n' right now is encountered. Whcih essentially means that
the choice of new line character is static. Why does this come as a a
configurable item in the DDL is unclear.
This limitation seems to be hardcoded here:
https://github.com/apache/hive/blob/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java#L171
IMPACT:
While storing freform data such as Email or Comments, it is fairly common to
have a '\n' character crop up. A lot of free form ETL on Linux using majority
of ETL tools also adds a $ (new line character) to maintain formating.
As the Hive Language manual shows this as a configurable property, it also
leads to misleading solution designs which fail when the create statement is
triggered in the development phase.
having the ability to choose your row delimiter is a very basic necessacity and
it is alarming the this is not supported till Hive 14 to the best of mu
knowledge.
> Row Delimiter other than '\n' throws error in Hive.
> ---------------------------------------------------
>
> Key: HIVE-11996
> URL: https://issues.apache.org/jira/browse/HIVE-11996
> Project: Hive
> Issue Type: Bug
> Components: Beeline, Database/Schema, Hive
> Affects Versions: 0.12.0
> Reporter: Sameer Gupta
> Assignee: Ashutosh Chauhan
> Priority: Critical
> Labels: DDL, Delimiter, Hive, Line,, SerDe
>
> ERROR CODE and ERROR TEXT:
> " LINES TERMINATED BY only supports newline '\n' right now. Error
> encountered near token ''\u0001'' (state=42000,code=40000)"
> ISSUE DISCRIPTION:
> Hive Language Manual States that Changing the Line Delimeter is Possible.
> row_format
> : DELIMITED [FIELDS TERMINATED BY char [ESCAPED BY char]] [COLLECTION ITEMS
> TERMINATED BY char]
> [MAP KEYS TERMINATED BY char] [LINES TERMINATED BY char]
> [NULL DEFINED AS char] -- (Note: Available in Hive 0.13 and later)
> | SERDE serde_name [WITH SERDEPROPERTIES (property_name=property_value,
> property_name=property_value, ...)]
> Ref:
> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-Create/Drop/TruncateTable
> But on defining the [LINES TERMINATED BY char], an error stating hive only
> supports newline '\n' right now is encountered. Whcih essentially means that
> the choice of new line character is static. Why does this come as a a
> configurable item in the DDL is unclear.
> This limitation seems to be hardcoded here:
> https://github.com/apache/hive/blob/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java#L171
> IMPACT:
> While storing freform data such as Email or Comments, it is fairly common to
> have a '\n' character crop up. A lot of free form ETL on Linux using majority
> of ETL tools also adds a $ (new line character) to maintain formating.
> As the Hive Language manual shows this as a configurable property, it also
> leads to misleading solution designs which fail when the create statement is
> triggered in the development phase.
> having the ability to choose your row delimiter is a very basic necessacity
> and it is alarming the this is not supported till Hive 14 to the best of mu
> knowledge.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)