[ 
https://issues.apache.org/jira/browse/FLINK-1208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14212761#comment-14212761
 ] 

ASF GitHub Bot commented on FLINK-1208:
---------------------------------------

Github user fhueske commented on the pull request:

    https://github.com/apache/incubator-flink/pull/201#issuecomment-63122212
  
    Thanks for your PR!
    
    I think there are a few issues with your approach. For example, a CSV file 
that starts with a String field will not be skipped if it starts with a comment 
character such as '#' or '//'. Also, your changes on the DataSourceTask have 
implications for all InputFormats which is definitely not desired.
    
    IMO, it is necessary to explicitly specify a comment string and check for 
it at the beginning of each line.
    
    Skipping invalid lines is also a good feature in my opinion. It would be 
good to inform the user about invalid lines. Maybe counting the number of 
invalid line for each split and emit a log statement.


> Skip comment lines in CSV input format. Allow user to specify comment 
> character.
> --------------------------------------------------------------------------------
>
>                 Key: FLINK-1208
>                 URL: https://issues.apache.org/jira/browse/FLINK-1208
>             Project: Flink
>          Issue Type: Improvement
>          Components: Java API, Scala API
>    Affects Versions: 0.8-incubating
>            Reporter: Aljoscha Krettek
>            Assignee: Felix Neutatz
>            Priority: Minor
>              Labels: starter
>
> The current skipFirstLine is limited. Skipping arbitrary lines that start 
> with a certain character would be much more flexible while still easy to 
> implement.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to