[ 
https://issues.apache.org/jira/browse/HIVE-15475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16776298#comment-16776298
 ] 

BELUGA BEHR commented on HIVE-15475:
------------------------------------

Nope. OK.  Figured it out.

This issue was inadvertently fixed as part of [HIVE-18545] (Jul 10, 2018).  
Previous to this change, the JSON stuff was handled by 
{{org.apache.hive.hcatalog.data.JsonSerDe}}

The issue was that this class was not handling the provided {{Text}} object 
correctly.  The {{Text}} object has two components to it: an internal array of 
bytes *and* a size that indicates which bytes are to be processed.  Well, 
{{JsonSerde}} was not taking into account the size, so, when a zero-length 
{{Text}} object was submitted, it would still look at the entire internal byte 
array, ignoring the zero size, and produce duplicates where there should be no 
text.

https://github.com/apache/hive/blob/ae008b79b5d52ed6a38875b73025a505725828eb/hcatalog/core/src/main/java/org/apache/hive/hcatalog/data/JsonSerDe.java#L168

> JsonSerDe cannot handle json file with empty lines
> --------------------------------------------------
>
>                 Key: HIVE-15475
>                 URL: https://issues.apache.org/jira/browse/HIVE-15475
>             Project: Hive
>          Issue Type: Bug
>          Components: HCatalog
>    Affects Versions: 1.2.1
>            Reporter: pin_zhang
>            Priority: Major
>
> 1. start HiveServer2 in apache-hive-1.2.1
> 2 start a beeline connect to hive server2
>   ADD JAR  ADD JAR 
> /home/apache-hive-1.2.1-bin/hcatalog/share/hcatalog/hive-hcatalog-core-1.2.1.jar
>  ;
>    CREATE external TABLE my_table(a string, b bigint)
> ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe'
> STORED AS TEXTFILE
> location 'file:///home/hive/json';
> 3 put a file with more than one new lines at the end of the file
> {"a":"a_1", "b" : 1}
> 4 run sql 
> select * from my_table ;
> +-------------+-------------+--+
> | my_table.a  | my_table.b  |
> +-------------+-------------+--+
> | a_1         | 1           |
> | a_1         | 1           |
> | a_1         | 1           |
> | a_1         | 1           |
> | a_1         | 1           |
> +-------------+-------------+--+



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to