Github user FelixNeutatz commented on a diff in the pull request:
https://github.com/apache/incubator-flink/pull/201#discussion_r20760662
--- Diff:
flink-java/src/main/java/org/apache/flink/api/java/io/CsvInputFormat.java ---
@@ -130,6 +216,21 @@ public OUT readRecord(OUT reuse, byte[] bytes, int
offset, int numBytes) {
numBytes--;
}
+ if (commentPrefix != null && commentPrefix.length <= numBytes) {
+ //check record for comments
+ Boolean isComment = true;
+ for (int i = 0; i < commentPrefix.length; i++) {
+ if (commentPrefix[i] != bytes[offset + i]) {
+ isComment = false;
+ break;
+ }
+ }
+ if (isComment) {
+ this.commentCount++;
+ return nextRecord(reuse);
--- End diff --
Fabian told me to not return null: "That's what I meant by letting the
DelimitedInputFormat handling invalid lines. I would not give the null value
back to the DataSourceTask, but instead let the DelimitedInputFormat catch this
and try to call readRecord() until a valid record is return and hand that to
the DataSourceTask.
I am actually surprised that giving a null value to the data source does
not cause a NPE." (cite by Fabian)
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---