[jira] [Updated] (DRILL-5549) SELECT * against a CSV file with empty headers produces error

Paul Rogers (JIRA) Mon, 29 May 2017 14:34:16 -0700

     [ 
https://issues.apache.org/jira/browse/DRILL-5549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Paul Rogers updated DRILL-5549:
-------------------------------
    Description: 
See DRILL-5548 for background. This test is very similar except that the input 
file has a single blank line. Since the CSV plugin is set up to read headers, 
this represents a non-empty file, with no headers and no data.

Again use a star query:

{code}
SELECT * FROM `dfs.data.emptyHeader.csv`
{code}

The result this time is somewhat different:

{code}
org.apache.drill.common.exceptions.UserRemoteException: 
SYSTEM ERROR: HeaderError: The file must define at least one header.
{code}

If we fix the issue in DRILL-5548 (empty file), we should use the same solution 
for a file with an empty header.

Suppose the file was not entirely empty, suppose it was:

{code}
1:
2: fred
3: barney
{code}

(Note: line numbers shown to force display of blank line...)

Here, we have an empty header, but we have data. We can't know that we have 
data while reading the header. While this is a pathological case, and an 
invalid CSV file, this second case does suggest that the right solution to the 
two empty cases is to use the special {{columns}} array when the header is 
empty. This will allow Drill to gracefully handle the case above for a file 
with no header but with data.

  was:
See DRILL-5548 for background. This test is very similar except that the input 
file has a single blank line. Since the CSV plugin is set up to read headers, 
this represents a non-empty file, with no headers and no data.

The result this time is somewhat different:

{code}
org.apache.drill.common.exceptions.UserRemoteException: 
SYSTEM ERROR: HeaderError: The file must define at least one header.
{code}

If we fix the issue in DRILL-5548 (empty file), we should use the same solution 
for a file with an empty header.

Suppose the file was not entirely empty, suppose it was:

{code}
1:
2: fred
3: barney
{code}

(Note: line numbers shown to force display of blank line...)

Here, we have an empty header, but we have data. We can't know that we have 
data while reading the header. While this is a pathological case, and an 
invalid CSV file, this second case does suggest that the right solution to the 
two empty cases is to use the special {{columns}} array when the header is 
empty. This will allow Drill to gracefully handle the case above for a file 
with no header but with data.


> SELECT * against a CSV file with empty headers produces error
> -------------------------------------------------------------
>
>                 Key: DRILL-5549
>                 URL: https://issues.apache.org/jira/browse/DRILL-5549
>             Project: Apache Drill
>          Issue Type: Bug
>    Affects Versions: 1.10.0
>            Reporter: Paul Rogers
>            Priority: Minor
>
> See DRILL-5548 for background. This test is very similar except that the 
> input file has a single blank line. Since the CSV plugin is set up to read 
> headers, this represents a non-empty file, with no headers and no data.
> Again use a star query:
> {code}
> SELECT * FROM `dfs.data.emptyHeader.csv`
> {code}
> The result this time is somewhat different:
> {code}
> org.apache.drill.common.exceptions.UserRemoteException: 
> SYSTEM ERROR: HeaderError: The file must define at least one header.
> {code}
> If we fix the issue in DRILL-5548 (empty file), we should use the same 
> solution for a file with an empty header.
> Suppose the file was not entirely empty, suppose it was:
> {code}
> 1:
> 2: fred
> 3: barney
> {code}
> (Note: line numbers shown to force display of blank line...)
> Here, we have an empty header, but we have data. We can't know that we have 
> data while reading the header. While this is a pathological case, and an 
> invalid CSV file, this second case does suggest that the right solution to 
> the two empty cases is to use the special {{columns}} array when the header 
> is empty. This will allow Drill to gracefully handle the case above for a 
> file with no header but with data.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (DRILL-5549) SELECT * against a CSV file with empty headers produces error

Reply via email to