[ 
https://issues.apache.org/jira/browse/HIVE-14404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marta Kuczora updated HIVE-14404:
---------------------------------
    Release Note: Introduced the new "dvs2" outputformat, which supports 
multiple characters as delimiter.
          Status: Patch Available  (was: Open)

Introduced a new outputformat (dsv2) which supports multiple characters as 
delimiter.
For generating the dsv, csv2 and tsv2 outputformats, the Super CSV library is 
used. This library doesn’t support multiple characters as delimiter. Since the 
same logic is used for generating csv2, tsv2 and dsv outputformats, I decided 
not to change this logic, rather introduce a new outputformat (dsv2) which 
supports multiple characters as delimiter. 
The new dsv2 outputformat has the same escaping logic as the dsv outputformat 
if the quoting is not disabled.
Extended the TestBeeLineWithArgs tests with new test steps which are using 
multiple characters as delimiter.

Main changes in the code:
- Changed the SeparatedValuesOutputFormat class to be an abstract class and 
created two new child classes to separate the logic for single-character and 
multi-character delimiters:
SingleCharSeparatedValuesOutputFormat and MultiCharSeparatedValuesOutputFormat
- Kept the methods which are used by both children in the 
SeparatedValuesOutputFormat and moved the methods specific to the 
single-character case to the SingleCharSeparatedValuesOutputFormat class. 
- Didn’t change the logic which was in the SeparatedValuesOutputFormat, only 
moved some parts to the child class.
- Implemented the value escaping and concatenation with the delimiter string in 
the MultiCharSeparatedValuesOutputFormat.

> Allow delimiterfordsv to use multiple-character delimiters
> ----------------------------------------------------------
>
>                 Key: HIVE-14404
>                 URL: https://issues.apache.org/jira/browse/HIVE-14404
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Stephen Measmer
>            Assignee: Marta Kuczora
>         Attachments: HIVE-14404.patch
>
>
> HIVE-5871 allows for reading multiple character delimiters.  Would like the 
> ability to use outputformat=dsv and define multiple character delimiters.  
> Today  delimiterfordsv only uses on character even if multiple are passes.
> For example:
> when I use:
> beeline>!set outputformat dsv
> beeline>!set delimiterfordsv "^-^"
>  I get:
> 111201081253106275^31-Oct-2011 
> 00:00:00^Text^201605232823^2016051968232151^201605232823_2016051968232151_00000_0_1
>  
> Would like it to be:
> 111201081253106275^-^31-Oct-2011 
> 00:00:00^-^Text^-^201605232823^-^2016051968232151^-^201605232823_2016051968232151_00000_0_1
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to