Marcus Truscello created SQOOP-2750:
---------------------------------------

             Summary: Support --fields-terminated-by value greater than 127 
when using --hive-import
                 Key: SQOOP-2750
                 URL: https://issues.apache.org/jira/browse/SQOOP-2750
             Project: Sqoop
          Issue Type: Improvement
          Components: hive-integration
    Affects Versions: 1.99.6
            Reporter: Marcus Truscello
            Priority: Minor


Using a {{fields-terminated-by}} value greater than 127 builds a file with the 
correct delimiter but causes an exception when included with {{hive-import}}.  
The relevant code is in {{src/java/apache/sqoop/hive/TableDefWriter.java}}:
https://github.com/apache/sqoop/blob/f19e2a523579db8c28a96febfd3cf35a5d58adc6/src/java/org/apache/sqoop/hive/TableDefWriter.java#L278-L300

The assumption is only half true.  Hive only supports delimiters up to 127 in 
*octal* form, but it also supports delimiters up to 255 in signed character 
form (two's compliment).  
For example, a {{fields-terminated-by}} value {{'\0376'}} (ASCII 254) is valid 
for sqoop, but when used in a Hive table definition it should be converted to 
{{'-2'}} (with single quotes).

I suggest rejecting delimiters over 255, converting delimiters over 127 to 
two's compliment signed characters, and leaving delimiters at or below 127 as 
octal.

(Work estimate inflated to account of number of tests that may need to be 
modified.)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to