[ 
https://issues.apache.org/jira/browse/HIVE-8763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14227096#comment-14227096
 ] 

Ferdinand Xu commented on HIVE-8763:
------------------------------------

Hi [~rstokes], can you please create a review board entry for your patch?

> Support for use of enclosed quotes in LazySimpleSerde
> -----------------------------------------------------
>
>                 Key: HIVE-8763
>                 URL: https://issues.apache.org/jira/browse/HIVE-8763
>             Project: Hive
>          Issue Type: Bug
>          Components: Serializers/Deserializers
>    Affects Versions: 0.11.0, 0.12.0, 0.13.0, 0.13.1
>         Environment: many - verified on Centos / Redhat with CDH
>            Reporter: ronan stokes
>         Attachments: HIVE-8763.1.patch
>
>
> Currently the LazySimpleSerde does not support the use of quotes for 
> delimited fields to allow use of separators within a quoted field - this 
> means having to use alternatives for many common use cases for CSV style 
> data. 
> Key scenarios that do not work include:
> (3 column row for int, string, float delimited by ',')
> 100,"3.5 inch hard drive, quantity 10",2650.30
> 100,"3.5 \" hard drive, quantity 10",2650.30
> 100,  "3.5 "" hard drive, quantity 10",  2650.30
> 100,"3.5 "" hard drive, quantity 10",2650.30
> There are a number of fixes that I have implemented support in the 
> deserialization stage to a copy of the Lazy simple serde to address this:
> For serialization, the code is unchanged with the relevant embedded 
> characters being escaped.
> Assuming a row with 3 fields - SKU ID, description, price, delimited by ','
> 1) allow use of enclosed quotes around a string field 
> For example 
> 100,"3.5 inch hard drive, quantity 10",2650.30
> 2) support escaping of quotes within field to allow use of embedded quote
> 100,"3.5 \" hard drive, quantity 10",2650.30
> 3) support for old style CSV embedded quotes 
> for example 
> 100,"3.5 "" hard drive, quantity 10",2650.30
> 4) support for skipping of leading spaces in field
> For example (note space between first ',' and opening quote)
> 100,  "3.5 "" hard drive, quantity 10",  2650.30
> In each case, with the changes these are evaluated as though the delimiters 
> and embedded quotes were escaped:
> e.g
> 100, 3.5 \" hard drive\, quantity 10,  2650.30
> All of these are enabled or disabled using serde properties for quotechar, 
> whether enclosed quotes is supported, whether double embedded quotes are 
> treated as single quote (of same char type)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to