[
https://issues.apache.org/jira/browse/DRILL-7869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Curtis Lambert updated DRILL-7869:
----------------------------------
Description:
Querying CSV files with \x0d new line delimiters results in "DATA_READ ERROR:
Column exceeds maximum length of 1024" with the default configuration.
The \x0d new line isn't used to break lines resulting in the entire file being
read in as a single record. This is configurable as "delimiter" in the format
but if you have mixed csv files with different line endings it's problematic.
If I have files with both \x0d and \x0d\x0a new lines (\r\n) and need to be
able to read both without having to change the configuration between queries.
was:
Querying CSV files with linux new line delimiters results in "DATA_READ ERROR:
Column exceeds maximum length of 1024".
The \x0d new line isn't used to break lines resulting in the entire file being
read in as a single record. This is configurable as "delimiter" in the format
but if you have mixed csv files with different line endings it's problematic.
If I have files with both \x0d and \x0d\x0a new lines (\r\n) and need to be
able to read both without having to change the configuration between queries.
> CSV files can't mix line breaks \x0d Vs. \x0d\x0a
> -------------------------------------------------
>
> Key: DRILL-7869
> URL: https://issues.apache.org/jira/browse/DRILL-7869
> Project: Apache Drill
> Issue Type: Bug
> Components: Storage - Text & CSV
> Affects Versions: 1.19.0
> Reporter: Curtis Lambert
> Priority: Minor
>
> Querying CSV files with \x0d new line delimiters results in "DATA_READ ERROR:
> Column exceeds maximum length of 1024" with the default configuration.
> The \x0d new line isn't used to break lines resulting in the entire file
> being read in as a single record. This is configurable as "delimiter" in the
> format but if you have mixed csv files with different line endings it's
> problematic. If I have files with both \x0d and \x0d\x0a new lines (\r\n) and
> need to be able to read both without having to change the configuration
> between queries.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)