[GitHub] incubator-flink pull request: enable CSV Reader to ignore invalid ...

FelixNeutatz Sat, 22 Nov 2014 15:24:41 -0800

Github user FelixNeutatz commented on a diff in the pull request:

    https://github.com/apache/incubator-flink/pull/201#discussion_r20760662
  
    --- Diff: 
flink-java/src/main/java/org/apache/flink/api/java/io/CsvInputFormat.java ---
    @@ -130,6 +216,21 @@ public OUT readRecord(OUT reuse, byte[] bytes, int 
offset, int numBytes) {
                        numBytes--;
                }
                
    +           if (commentPrefix != null && commentPrefix.length <= numBytes) {
    +                   //check record for comments
    +                   Boolean isComment = true;
    +                   for (int i = 0; i < commentPrefix.length; i++) {
    +                           if (commentPrefix[i] != bytes[offset + i]) {
    +                                   isComment = false;
    +                                   break;
    +                           }
    +                   }
    +                   if (isComment) {
    +                           this.commentCount++;
    +                           return nextRecord(reuse);
    --- End diff --
    
    Fabian told me to not return null:  "That's what I meant by letting the 
DelimitedInputFormat handling invalid lines. I would not give the null value 
back to the DataSourceTask, but instead let the DelimitedInputFormat catch this 
and try to call readRecord() until a valid record is return and hand that to 
the DataSourceTask.
    I am actually surprised that giving a null value to the data source does 
not cause a NPE." (cite by Fabian)



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-flink pull request: enable CSV Reader to ignore invalid ...

Reply via email to