[
https://issues.apache.org/jira/browse/IMPALA-3478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Armstrong updated IMPALA-3478:
----------------------------------
Labels: newbie ramp-up (was: ramp-up)
> Support for UTF-8 BOM on text backed tables.
> --------------------------------------------
>
> Key: IMPALA-3478
> URL: https://issues.apache.org/jira/browse/IMPALA-3478
> Project: IMPALA
> Issue Type: New Feature
> Components: Clients
> Affects Versions: Impala 2.3.0
> Reporter: Thomas Scott
> Priority: Minor
> Labels: newbie, ramp-up
>
> Data stored in Unicode UTF-8 can contain the Byte Order Mark (BOM) (Hex
> values "ef bb bf") at the beginning of the file. This is ignored in Hive but
> in Impala can cause the first field to be misrepresented. A good example of
> this is if the first column is of type timestamp. Impala will show this as
> null even though the data is valid in Hive.
> Steps to reproduce:
> In Hive:
> CREATE EXTERNAL TABLE IF NOT EXISTS test_table (col1 timestamp) LOCATION
> '/tmp/test_table'
> Then into the /tmp/test_table directory write a file with a BOM. I use vim
> for this as below:
> echo '2010-01-01 00:00:00.000' > foo
> vim -e -s -c ':set bomb' -c ':wq' foo
> SELECT * FROM test_table
> Will display the timestamp in Hive and NULL in Impala.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]