[jira] [Commented] (HIVE-16351) Hive confused by CR/LFs

Daniel Doubrovkine (JIRA) Sun, 02 Apr 2017 06:53:58 -0700

    [ 
https://issues.apache.org/jira/browse/HIVE-16351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15952685#comment-15952685
 ]


Daniel Doubrovkine commented on HIVE-16351:
-------------------------------------------

Those NULLs are actually from an empty record. Is that right? Should the last 
empty line be ignored since these are text files and if so which components 
responsibility is it? 

{code}
~$ cat /tmp/test.json 
{"text":"foo\nbar","number":123}
{"text":"bar\nfoo","number":345}
~$ perl -pe 'chomp if eof' /tmp/test.json > /tmp/test2.json
~$ cat /tmp/test2.json 
{"text":"foo\nbar","number":123}
{"text":"bar\nfoo","number":345}~$ 
$ hadoop fs -put -f /tmp/test2.json /user/data/test.json
$ hive
hive> SELECT * FROM test;
OK
foo
bar     123
bar
foo     345
{code}

> Hive confused by CR/LFs
> -----------------------
>
>                 Key: HIVE-16351
>                 URL: https://issues.apache.org/jira/browse/HIVE-16351
>             Project: Hive
>          Issue Type: Bug
>          Components: Hive, Serializers/Deserializers
>    Affects Versions: 1.2.1
>         Environment: Hadoop 2.7.3
>            Reporter: Daniel Doubrovkine
>
> From https://github.com/rcongiu/Hive-JSON-Serde/issues/65
> This happens with both JSON and MongoDB connector Serde, so I don't believe 
> this is a Serde bug.
> Using 
> http://www.congiu.net/hive-json-serde/1.3.6/cdh4/json-serde-1.3.6-jar-with-dependencies.jar
>  placed into /usr/local/Cellar/apache-hive-1.2.1/lib
> A dummy test.json with a CR/LF
> {code}
> $ cat /tmp/test.json
> {"text":"foo\nbar","number":123}
> $ hadoop fs -mkdir /user/data
> $ hadoop fs -put -f /tmp/test.json /user/data/test.json
> $ hive
> hive> CREATE DATABASE test;
> hive> CREATE EXTERNAL TABLE test ( text string )
>     > ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
>     > LOCATION '/user/data';
> hive> SELECT * FROM test;
> foo
> bar   123
> NULL  NULL
> {code}
> You can see how that's totally wrong, there's only one row of data.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HIVE-16351) Hive confused by CR/LFs

Reply via email to