[
https://issues.apache.org/jira/browse/HIVE-5871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Rui Li updated HIVE-5871:
-------------------------
Attachment: HIVE-5871.patch
This implementation mainly relies on LazySimpleSerDe for serialization and
deserialization. I added some methods to LazyStruct to parse a row delimited by
multiple-character string. Another difference from LazySimpleSerDe is that
MultiDelimitSerDe doesn't use Base64 to encode binary fields in serialization.
Because the encoded string may interfere with the delimiter. I also modified
LazyBinary, so that when it deserializes a binary field and is unable to
Base64 decode the field, it just keeps the data unchanged. A simple use case is
as follow:
create table test (id string,hivearray array<binary>,hivemap map<string,int>)
ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.MultiDelimitSerDe' WITH
SERDEPROPERTIES
("field.delimited"="[,]","collection.delimited"=":","mapkey.delimited"="@");
where field.delimited is the multiple-char field delimiter.
collection.delimited is the delimiter for collection items. mapkey.delimited is
the delimiter for keys and values in maps. We currently don't support
multiple-char for these two delimiters.
> Use multiple-characters as field delimiter
> ------------------------------------------
>
> Key: HIVE-5871
> URL: https://issues.apache.org/jira/browse/HIVE-5871
> Project: Hive
> Issue Type: Improvement
> Components: Contrib
> Affects Versions: 0.12.0
> Reporter: Rui Li
> Attachments: HIVE-5871.patch
>
>
> Add a new SerDe named MultiDelimitSerDe. With MultiDelimitSerDe, users can
> specify a multiple-character field delimiter when creating tables.
--
This message was sent by Atlassian JIRA
(v6.1#6144)