[ 
https://issues.apache.org/jira/browse/IMPALA-3478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong updated IMPALA-3478:
----------------------------------
    Labels: newbie ramp-up  (was: ramp-up)

> Support for UTF-8 BOM on text backed tables.
> --------------------------------------------
>
>                 Key: IMPALA-3478
>                 URL: https://issues.apache.org/jira/browse/IMPALA-3478
>             Project: IMPALA
>          Issue Type: New Feature
>          Components: Clients
>    Affects Versions: Impala 2.3.0
>            Reporter: Thomas Scott
>            Priority: Minor
>              Labels: newbie, ramp-up
>
> Data stored in Unicode UTF-8 can contain the Byte Order Mark (BOM) (Hex 
> values "ef bb bf") at the beginning of the file. This is ignored in Hive but 
> in Impala can cause the first field to be misrepresented. A good example of 
> this is if the first column is of type timestamp. Impala will show this as 
> null even though the data is valid in Hive.
> Steps to reproduce:
> In Hive:
> CREATE EXTERNAL TABLE IF NOT EXISTS test_table (col1 timestamp) LOCATION 
> '/tmp/test_table'
> Then into the /tmp/test_table directory write a file with a BOM. I use vim 
> for this as below:
> echo '2010-01-01 00:00:00.000' > foo
> vim -e -s -c ':set bomb' -c ':wq' foo
> SELECT * FROM test_table
> Will display the timestamp in Hive and NULL in Impala.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to