[
https://issues.apache.org/jira/browse/HIVE-21240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16777102#comment-16777102
]
BELUGA BEHR commented on HIVE-21240:
------------------------------------
[~kgyrtkirk] Thanks!
#1 I'm not sure I understand the first request. Are you talking specifically
about the HCat code? Are there missing unit tests here? Is that why it passes
even though the data types have been changed? As I see it the native arrays
are all transformed into Java Collections:
{code:java|title=HCat JsonSerDe}
List fatRow = fatLand((Object[]) row);
return new DefaultHCatRecord(fatRow);
...
return Arrays.asList(ArrayUtils.toObject((int[]) arr));
{code}
So, the JSON SerDe should just create Java Collections from the get-go instead
of having to transform it later.
#2 I noted that the Kafka_Handler Q-Test fails locally on trunk as well. I
searched across JIRA and see this test fails across many places. I can keep
looking at it though.
#3 I don't think there's much value in going back and changing the code and
testing it. These proposed changes are not about making the SerDe faster, I
just want to put out there that there isn't a huge regression. If it's a bit
quicker, than that's an added bonus.
> JSON SerDe Re-Write
> -------------------
>
> Key: HIVE-21240
> URL: https://issues.apache.org/jira/browse/HIVE-21240
> Project: Hive
> Issue Type: Improvement
> Components: Serializers/Deserializers
> Affects Versions: 4.0.0, 3.1.1
> Reporter: BELUGA BEHR
> Assignee: BELUGA BEHR
> Priority: Major
> Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: HIVE-21240.1.patch, HIVE-21240.1.patch,
> HIVE-21240.10.patch, HIVE-21240.2.patch, HIVE-21240.3.patch,
> HIVE-21240.4.patch, HIVE-21240.5.patch, HIVE-21240.6.patch,
> HIVE-21240.7.patch, HIVE-21240.9.patch, HIVE-24240.8.patch
>
> Time Spent: 10m
> Remaining Estimate: 0h
>
> The JSON SerDe has a few issues, I will link them to this JIRA.
> * Use Jackson Tree parser instead of manually parsing
> * Added support for base-64 encoded data (the expected format when using JSON)
> * Added support to skip blank lines (returns all columns as null values)
> * Current JSON parser accepts, but does not apply, custom timestamp formats
> in most cases
> * Added some unit tests
> * Added cache for column-name to column-index searches, currently O\(n\) for
> each row processed, for each column in the row
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)