[jira] [Commented] (DRILL-951) CSV header row should be parsed

ASF GitHub Bot (JIRA) Mon, 02 Nov 2015 21:06:53 -0800

    [ 
https://issues.apache.org/jira/browse/DRILL-951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14986695#comment-14986695
 ]


ASF GitHub Bot commented on DRILL-951:
--------------------------------------

Github user jacques-n commented on a diff in the pull request:

    https://github.com/apache/drill/pull/232#discussion_r43716276
  
    --- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/easy/text/compliant/CompliantTextRecordReader.java
 ---
    @@ -101,16 +108,23 @@ public void setup(OperatorContext context, 
OutputMutator outputMutator) throws E
           InputStream stream = 
dfs.openPossiblyCompressedStream(split.getPath());
           TextInput input = new TextInput(settings,  stream, readBuffer, 
split.getStart(), split.getStart() + split.getLength());
     
    -      TextOutput output = null;
           if(settings.isUseRepeatedVarChar()){
    -        output = new RepeatedVarCharOutput(outputMutator, getColumns(), 
isStarQuery());
    -      }else{
    -        //TODO: Add field output.
    -        throw new UnsupportedOperationException();
    +        TextOutput output = new RepeatedVarCharOutput(outputMutator, 
getColumns(), isStarQuery());
    +        this.reader = new TextReader(settings, input, output, 
whitespaceBuffer);
    +        reader.start();
    +      } else {
    +        // two-phase read approach.
    +        // phase-1: read the header from the file
    +        String [] fieldNames = extractHeader(input);
    +
    +        // phase-2: now read the data
    +        stream = dfs.openPossiblyCompressedStream(split.getPath());
    +        input = new TextInput(settings,  stream, readBuffer, 
split.getStart(), split.getStart() + split.getLength());
    +        TextOutput output = new FieldVarCharOutput(outputMutator, 
fieldNames, getColumns(), isStarQuery());
    +        this.reader = new TextReader(settings, input, output, 
whitespaceBuffer);
    --- End diff --
    
    These last two lines can probably be pulled out of the if/else


> CSV header row should be parsed
> -------------------------------
>
>                 Key: DRILL-951
>                 URL: https://issues.apache.org/jira/browse/DRILL-951
>             Project: Apache Drill
>          Issue Type: New Feature
>          Components: Storage - Text & CSV
>            Reporter: Tomer Shiran
>            Assignee: Abhijit Pol
>             Fix For: Future
>
>
> CSV reader is currently treating header names like regular rows. There should 
> be a way to treat the header row as the column names (optional?).
> I exported this dataset to a CSV: 
> https://data.sfgov.org/Public-Safety/SFPD-Incidents-Previous-Three-Months/tmnf-yvry



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-951) CSV header row should be parsed

Reply via email to