[
https://issues.apache.org/jira/browse/HIVE-16351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15952685#comment-15952685
]
Daniel Doubrovkine commented on HIVE-16351:
-------------------------------------------
Those NULLs are actually from an empty record. Is that right? Should the last
empty line be ignored since these are text files and if so which components
responsibility is it?
{code}
~$ cat /tmp/test.json
{"text":"foo\nbar","number":123}
{"text":"bar\nfoo","number":345}
~$ perl -pe 'chomp if eof' /tmp/test.json > /tmp/test2.json
~$ cat /tmp/test2.json
{"text":"foo\nbar","number":123}
{"text":"bar\nfoo","number":345}~$
$ hadoop fs -put -f /tmp/test2.json /user/data/test.json
$ hive
hive> SELECT * FROM test;
OK
foo
bar 123
bar
foo 345
{code}
> Hive confused by CR/LFs
> -----------------------
>
> Key: HIVE-16351
> URL: https://issues.apache.org/jira/browse/HIVE-16351
> Project: Hive
> Issue Type: Bug
> Components: Hive, Serializers/Deserializers
> Affects Versions: 1.2.1
> Environment: Hadoop 2.7.3
> Reporter: Daniel Doubrovkine
>
> From https://github.com/rcongiu/Hive-JSON-Serde/issues/65
> This happens with both JSON and MongoDB connector Serde, so I don't believe
> this is a Serde bug.
> Using
> http://www.congiu.net/hive-json-serde/1.3.6/cdh4/json-serde-1.3.6-jar-with-dependencies.jar
> placed into /usr/local/Cellar/apache-hive-1.2.1/lib
> A dummy test.json with a CR/LF
> {code}
> $ cat /tmp/test.json
> {"text":"foo\nbar","number":123}
> $ hadoop fs -mkdir /user/data
> $ hadoop fs -put -f /tmp/test.json /user/data/test.json
> $ hive
> hive> CREATE DATABASE test;
> hive> CREATE EXTERNAL TABLE test ( text string )
> > ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
> > LOCATION '/user/data';
> hive> SELECT * FROM test;
> foo
> bar 123
> NULL NULL
> {code}
> You can see how that's totally wrong, there's only one row of data.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)