[
https://issues.apache.org/jira/browse/HIVE-11785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Aihua Xu updated HIVE-11785:
----------------------------
Release Note: This change with HIVE-12820 in addition adds the support of
carriage return and new line characters in the fields. Before this change, the
user needs to preprocess the text by replacing them with some characters other
than carriage return and new line in order for the files to be properly
processed. With this change, it will automatically escape them if
{{serialization.escape.crlf}} serde property is set to true. One incompatible
change is: characters 'r' and 'n' cannot be used as separator or field
delimiter. (was: This change with HIVE-12820 in addition adds the support of
carriage return and new line characters in the fields. Before this change, the
user needs to preprocess the text by replacing them with some characters other
than carriage return and new line in order for the files to be properly
processed. With this change, it will automatically escape them if
{{serialization.escape.crlf}} serde property is set to true. One incompatible
change is: characters 'r' and 'n' cannot be used as separator or field
delimiter )
> Support escaping carriage return and new line for LazySimpleSerDe
> -----------------------------------------------------------------
>
> Key: HIVE-11785
> URL: https://issues.apache.org/jira/browse/HIVE-11785
> Project: Hive
> Issue Type: New Feature
> Components: Query Processor
> Affects Versions: 2.0.0
> Reporter: Aihua Xu
> Assignee: Aihua Xu
> Labels: TODOC2.0
> Fix For: 2.0.0
>
> Attachments: HIVE-11785.2.patch, HIVE-11785.3.patch,
> HIVE-11785.patch, test.parquet
>
>
> Create the table and perform the queries as follows. You will see different
> results when the setting changes.
> The expected result should be:
> {noformat}
> 1 newline
> here
> 2 carriage return
> 3 both
> here
> {noformat}
> {noformat}
> hive> create table repo (lvalue int, charstring string) stored as parquet;
> OK
> Time taken: 0.34 seconds
> hive> load data inpath '/tmp/repo/test.parquet' overwrite into table repo;
> Loading data to table default.repo
> chgrp: changing ownership of
> 'hdfs://nameservice1/user/hive/warehouse/repo/test.parquet': User does not
> belong to hive
> Table default.repo stats: [numFiles=1, numRows=0, totalSize=610,
> rawDataSize=0]
> OK
> Time taken: 0.732 seconds
> hive> set hive.fetch.task.conversion=more;
> hive> select * from repo;
> OK
> 1 newline
> here
> here carriage return
> 3 both
> here
> Time taken: 0.253 seconds, Fetched: 3 row(s)
> hive> set hive.fetch.task.conversion=none;
> hive> select * from repo;
> Query ID = root_20150909113535_e081db8b-ccd9-4c44-aad9-d990ffb8edf3
> Total jobs = 1
> Launching Job 1 out of 1
> Number of reduce tasks is set to 0 since there's no reduce operator
> Starting Job = job_1441752031022_0006, Tracking URL =
> http://host-10-17-81-63.coe.cloudera.com:8088/proxy/application_1441752031022_0006/
> Kill Command =
> /opt/cloudera/parcels/CDH-5.4.5-1.cdh5.4.5.p0.7/lib/hadoop/bin/hadoop job
> -kill job_1441752031022_0006
> Hadoop job information for Stage-1: number of mappers: 1; number of reducers: > 0
> 2015-09-09 11:35:54,127 Stage-1 map = 0%, reduce = 0%
> 2015-09-09 11:36:04,664 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 2.98
> sec
> MapReduce Total cumulative CPU time: 2 seconds 980 msec
> Ended Job = job_1441752031022_0006
> MapReduce Jobs Launched:
> Stage-Stage-1: Map: 1 Cumulative CPU: 2.98 sec HDFS Read: 4251 HDFS
> Write: 51 SUCCESS
> Total MapReduce CPU Time Spent: 2 seconds 980 msec
> OK
> 1 newline
> NULL NULL
> 2 carriage return
> NULL NULL
> 3 both
> NULL NULL
> Time taken: 25.131 seconds, Fetched: 6 row(s)
> hive>
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)