[
https://issues.apache.org/jira/browse/FLINK-5907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15885778#comment-15885778
]
ASF GitHub Bot commented on FLINK-5907:
---------------------------------------
Github user fhueske commented on a diff in the pull request:
https://github.com/apache/flink/pull/3417#discussion_r103203173
--- Diff:
flink-core/src/main/java/org/apache/flink/api/common/io/GenericCsvInputFormat.java
---
@@ -358,24 +358,27 @@ protected boolean parseRecord(Object[] holders,
byte[] bytes, int offset, int nu
for (int field = 0, output = 0; field < fieldIncluded.length;
field++) {
// check valid start position
- if (startPos >= limit) {
+ if (startPos > limit || (startPos == limit && field !=
fieldIncluded.length - 1)) {
if (lenient) {
return false;
} else {
throw new ParseException("Row too
short: " + new String(bytes, offset, numBytes));
}
}
-
+
if (fieldIncluded[field]) {
// parse field
@SuppressWarnings("unchecked")
FieldParser<Object> parser =
(FieldParser<Object>) this.fieldParsers[output];
Object reuse = holders[output];
startPos =
parser.resetErrorStateAndParse(bytes, startPos, limit, this.fieldDelim, reuse);
holders[output] = parser.getLastResult();
-
+
// check parse result
- if (startPos < 0) {
+ if (startPos < 0 ||
+ (startPos == limit
--- End diff --
Move this condition into an `else if` branch and give a more detailed error
message (row to short).
Also add a comment that we read the whole records but that there are fields
missing.
> RowCsvInputFormat bug on parsing tsv
> ------------------------------------
>
> Key: FLINK-5907
> URL: https://issues.apache.org/jira/browse/FLINK-5907
> Project: Flink
> Issue Type: Bug
> Components: Java API
> Affects Versions: 1.2.0
> Reporter: Flavio Pompermaier
> Assignee: Kurt Young
> Labels: csv, parsing
> Attachments: test.tsv
>
>
> The following snippet reproduce the problem (using the attached file as
> input):
> {code:language=java}
> char fieldDelim = '\t';
> TypeInformation<?>[] fieldTypes = new TypeInformation<?>[51];
> for (int i = 0; i < fieldTypes.length; i++) {
> fieldTypes[i] = BasicTypeInfo.STRING_TYPE_INFO;
> }
> int[] fieldMask = new int[fieldTypes.length];
> for (int i = 0; i < fieldMask.length; i++) {
> fieldMask[i] = i;
> }
> RowCsvInputFormat csvIF = new RowCsvInputFormat(new Path(testCsv),
> fieldTypes, "\n", fieldDelim +"",
> fieldMask, true);
> csvIF.setNestedFileEnumeration(true);
> DataSet<Row> csv = env.createInput(csvIF);
> csv.print()
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)