[ 
https://issues.apache.org/jira/browse/HIVE-11996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sameer Gupta updated HIVE-11996:
--------------------------------
    Description: 
ERROR CODE and ERROR TEXT:

        " LINES TERMINATED BY only supports newline '\n' right now. Error 
encountered near token ''\u0001'' (state=42000,code=40000)"

ISSUE DISCRIPTION:

Hive Language Manual States that Changing the Line Delimeter is Possible.

row_format
  : DELIMITED [FIELDS TERMINATED BY char [ESCAPED BY char]] [COLLECTION ITEMS 
TERMINATED BY char]
        [MAP KEYS TERMINATED BY char] [LINES TERMINATED BY char]
        [NULL DEFINED AS char]   -- (Note: Available in Hive 0.13 and later)
  | SERDE serde_name [WITH SERDEPROPERTIES (property_name=property_value, 
property_name=property_value, ...)]

Ref: 
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-Create/Drop/TruncateTable

But on defining the [LINES TERMINATED BY char], an error stating hive only 
supports newline '\n' right now is encountered. Whcih essentially means that 
the choice of new line character is static. Why does this come as a a 
configurable item in the DDL is unclear.

This limitation seems to be hardcoded here:
https://github.com/apache/hive/blob/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java#L171

IMPACT:

While storing freform data such as Email or Comments, it is fairly common to 
have a '\n' character crop up. A lot of free form ETL on Linux using majority 
of ETL tools also adds a $ (new line character) to maintain formating. 

As the Hive Language manual shows this as a configurable property, it also 
leads to misleading solution designs which fail when the create statement is 
triggered in the development phase.

having the ability to choose your row delimiter is a very basic necessacity and 
it is alarming the this is not supported till Hive 14 to the best of mu 
knowledge.


  was:
Error Code and Error Text:

        " LINES TERMINATED BY only supports newline '\n' right now. Error 
encountered near token ''\u0001'' (state=42000,code=40000)"

ISSUE DISCRIPTION:

Hive Language Manual States that Changing the Line Delimeter is Possible.

row_format
  : DELIMITED [FIELDS TERMINATED BY char [ESCAPED BY char]] [COLLECTION ITEMS 
TERMINATED BY char]
        [MAP KEYS TERMINATED BY char] [LINES TERMINATED BY char]
        [NULL DEFINED AS char]   -- (Note: Available in Hive 0.13 and later)
  | SERDE serde_name [WITH SERDEPROPERTIES (property_name=property_value, 
property_name=property_value, ...)]

Ref: 
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-Create/Drop/TruncateTable

But on defining the [LINES TERMINATED BY char], an error stating hive only 
supports newline '\n' right now is encountered. Whcih essentially means that 
the choice of new line character is static. Why does this come as a a 
configurable item in the DDL is unclear.

This limitation seems to be hardcoded here:
https://github.com/apache/hive/blob/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java#L171

IMPACT:

While storing freform data such as Email or Comments, it is fairly common to 
have a '\n' character crop up. A lot of free form ETL on Linux using majority 
of ETL tools also adds a $ (new line character) to maintain formating. 

As the Hive Language manual shows this as a configurable property, it also 
leads to misleading solution designs which fail when the create statement is 
triggered in the development phase.

having the ability to choose your row delimiter is a very basic necessacity and 
it is alarming the this is not supported till Hive 14 to the best of mu 
knowledge.



> Row Delimiter other than '\n' throws error in Hive.
> ---------------------------------------------------
>
>                 Key: HIVE-11996
>                 URL: https://issues.apache.org/jira/browse/HIVE-11996
>             Project: Hive
>          Issue Type: Bug
>          Components: Beeline, Database/Schema, Hive
>    Affects Versions: 0.12.0
>            Reporter: Sameer Gupta
>            Assignee: Ashutosh Chauhan
>            Priority: Critical
>              Labels: DDL, Delimiter, Hive, Line,, SerDe
>
> ERROR CODE and ERROR TEXT:
>         " LINES TERMINATED BY only supports newline '\n' right now. Error 
> encountered near token ''\u0001'' (state=42000,code=40000)"
> ISSUE DISCRIPTION:
> Hive Language Manual States that Changing the Line Delimeter is Possible.
> row_format
>   : DELIMITED [FIELDS TERMINATED BY char [ESCAPED BY char]] [COLLECTION ITEMS 
> TERMINATED BY char]
>         [MAP KEYS TERMINATED BY char] [LINES TERMINATED BY char]
>         [NULL DEFINED AS char]   -- (Note: Available in Hive 0.13 and later)
>   | SERDE serde_name [WITH SERDEPROPERTIES (property_name=property_value, 
> property_name=property_value, ...)]
> Ref: 
> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-Create/Drop/TruncateTable
> But on defining the [LINES TERMINATED BY char], an error stating hive only 
> supports newline '\n' right now is encountered. Whcih essentially means that 
> the choice of new line character is static. Why does this come as a a 
> configurable item in the DDL is unclear.
> This limitation seems to be hardcoded here:
> https://github.com/apache/hive/blob/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java#L171
> IMPACT:
> While storing freform data such as Email or Comments, it is fairly common to 
> have a '\n' character crop up. A lot of free form ETL on Linux using majority 
> of ETL tools also adds a $ (new line character) to maintain formating. 
> As the Hive Language manual shows this as a configurable property, it also 
> leads to misleading solution designs which fail when the create statement is 
> triggered in the development phase.
> having the ability to choose your row delimiter is a very basic necessacity 
> and it is alarming the this is not supported till Hive 14 to the best of mu 
> knowledge.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to