[
https://issues.apache.org/jira/browse/DRILL-951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14986695#comment-14986695
]
ASF GitHub Bot commented on DRILL-951:
--------------------------------------
Github user jacques-n commented on a diff in the pull request:
https://github.com/apache/drill/pull/232#discussion_r43716276
--- Diff:
exec/java-exec/src/main/java/org/apache/drill/exec/store/easy/text/compliant/CompliantTextRecordReader.java
---
@@ -101,16 +108,23 @@ public void setup(OperatorContext context,
OutputMutator outputMutator) throws E
InputStream stream =
dfs.openPossiblyCompressedStream(split.getPath());
TextInput input = new TextInput(settings, stream, readBuffer,
split.getStart(), split.getStart() + split.getLength());
- TextOutput output = null;
if(settings.isUseRepeatedVarChar()){
- output = new RepeatedVarCharOutput(outputMutator, getColumns(),
isStarQuery());
- }else{
- //TODO: Add field output.
- throw new UnsupportedOperationException();
+ TextOutput output = new RepeatedVarCharOutput(outputMutator,
getColumns(), isStarQuery());
+ this.reader = new TextReader(settings, input, output,
whitespaceBuffer);
+ reader.start();
+ } else {
+ // two-phase read approach.
+ // phase-1: read the header from the file
+ String [] fieldNames = extractHeader(input);
+
+ // phase-2: now read the data
+ stream = dfs.openPossiblyCompressedStream(split.getPath());
+ input = new TextInput(settings, stream, readBuffer,
split.getStart(), split.getStart() + split.getLength());
+ TextOutput output = new FieldVarCharOutput(outputMutator,
fieldNames, getColumns(), isStarQuery());
+ this.reader = new TextReader(settings, input, output,
whitespaceBuffer);
--- End diff --
These last two lines can probably be pulled out of the if/else
> CSV header row should be parsed
> -------------------------------
>
> Key: DRILL-951
> URL: https://issues.apache.org/jira/browse/DRILL-951
> Project: Apache Drill
> Issue Type: New Feature
> Components: Storage - Text & CSV
> Reporter: Tomer Shiran
> Assignee: Abhijit Pol
> Fix For: Future
>
>
> CSV reader is currently treating header names like regular rows. There should
> be a way to treat the header row as the column names (optional?).
> I exported this dataset to a CSV:
> https://data.sfgov.org/Public-Safety/SFPD-Incidents-Previous-Three-Months/tmnf-yvry
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)