[ https://issues.apache.org/jira/browse/FLINK-5907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15885778#comment-15885778 ]
ASF GitHub Bot commented on FLINK-5907: --------------------------------------- Github user fhueske commented on a diff in the pull request: https://github.com/apache/flink/pull/3417#discussion_r103203173 --- Diff: flink-core/src/main/java/org/apache/flink/api/common/io/GenericCsvInputFormat.java --- @@ -358,24 +358,27 @@ protected boolean parseRecord(Object[] holders, byte[] bytes, int offset, int nu for (int field = 0, output = 0; field < fieldIncluded.length; field++) { // check valid start position - if (startPos >= limit) { + if (startPos > limit || (startPos == limit && field != fieldIncluded.length - 1)) { if (lenient) { return false; } else { throw new ParseException("Row too short: " + new String(bytes, offset, numBytes)); } } - + if (fieldIncluded[field]) { // parse field @SuppressWarnings("unchecked") FieldParser<Object> parser = (FieldParser<Object>) this.fieldParsers[output]; Object reuse = holders[output]; startPos = parser.resetErrorStateAndParse(bytes, startPos, limit, this.fieldDelim, reuse); holders[output] = parser.getLastResult(); - + // check parse result - if (startPos < 0) { + if (startPos < 0 || + (startPos == limit --- End diff -- Move this condition into an `else if` branch and give a more detailed error message (row to short). Also add a comment that we read the whole records but that there are fields missing. > RowCsvInputFormat bug on parsing tsv > ------------------------------------ > > Key: FLINK-5907 > URL: https://issues.apache.org/jira/browse/FLINK-5907 > Project: Flink > Issue Type: Bug > Components: Java API > Affects Versions: 1.2.0 > Reporter: Flavio Pompermaier > Assignee: Kurt Young > Labels: csv, parsing > Attachments: test.tsv > > > The following snippet reproduce the problem (using the attached file as > input): > {code:language=java} > char fieldDelim = '\t'; > TypeInformation<?>[] fieldTypes = new TypeInformation<?>[51]; > for (int i = 0; i < fieldTypes.length; i++) { > fieldTypes[i] = BasicTypeInfo.STRING_TYPE_INFO; > } > int[] fieldMask = new int[fieldTypes.length]; > for (int i = 0; i < fieldMask.length; i++) { > fieldMask[i] = i; > } > RowCsvInputFormat csvIF = new RowCsvInputFormat(new Path(testCsv), > fieldTypes, "\n", fieldDelim +"", > fieldMask, true); > csvIF.setNestedFileEnumeration(true); > DataSet<Row> csv = env.createInput(csvIF); > csv.print() > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346)