[ 
https://issues.apache.org/jira/browse/ORC-1031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17446744#comment-17446744
 ] 

Dongjoon Hyun edited comment on ORC-1031 at 11/20/21, 2:41 AM:
---------------------------------------------------------------

In that case, I'd like to recommend to use Apache Spark to read that CSV files 
directly and to save to Spark ORC tables. BTW, I'm Apache Spark PMC member too.


was (Author: dongjoon):
In that case, I'd like to recommend to use Apache Spark to read that CSV files 
directly and to save to Spark tables. BTW, I'm Apache Spark PMC member too.

> No way to escape delimiter in column values
> -------------------------------------------
>
>                 Key: ORC-1031
>                 URL: https://issues.apache.org/jira/browse/ORC-1031
>             Project: ORC
>          Issue Type: Bug
>          Components: C++
>            Reporter: Varun Raval
>            Priority: Major
>
> I am using the C++ csv to orc tool to convert csv file to orc file and I 
> could not find a way to escape the delimiters present in the column values of 
> the table in csv file. If a delimiter is present as part of a column value in 
> csv file, csv to orc tool uses that character to separate the columns and 
> that messes up the data in the orc file.
>  
> For my scenario, all the possible values for delimiter can be a character in 
> one of the columns in csv file.
> To provide more information about my use case, I have a hive table with 
> binary column and I have a csv file with that column having binary data. I am 
> converting csv file to orc file using this tool. There are no limitations on 
> what kind of data that binary column can have and hence the delimiter we use 
> for csv to orc conversion, can end up inside that binary column.
> Sample value of the binary column shown below
> {code:java}
> 9Tl���������������~sjc_\[[\^`a`]WPF:."�������������������+Gaw���������������xnf`][Z[\_`a_[TK@4
> {code}
>  
> If there is a way to escape the delimiter characters in the column values, 
> that would be really useful!



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to