[ https://issues.apache.org/jira/browse/FLINK-1208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14212761#comment-14212761 ]
ASF GitHub Bot commented on FLINK-1208: --------------------------------------- Github user fhueske commented on the pull request: https://github.com/apache/incubator-flink/pull/201#issuecomment-63122212 Thanks for your PR! I think there are a few issues with your approach. For example, a CSV file that starts with a String field will not be skipped if it starts with a comment character such as '#' or '//'. Also, your changes on the DataSourceTask have implications for all InputFormats which is definitely not desired. IMO, it is necessary to explicitly specify a comment string and check for it at the beginning of each line. Skipping invalid lines is also a good feature in my opinion. It would be good to inform the user about invalid lines. Maybe counting the number of invalid line for each split and emit a log statement. > Skip comment lines in CSV input format. Allow user to specify comment > character. > -------------------------------------------------------------------------------- > > Key: FLINK-1208 > URL: https://issues.apache.org/jira/browse/FLINK-1208 > Project: Flink > Issue Type: Improvement > Components: Java API, Scala API > Affects Versions: 0.8-incubating > Reporter: Aljoscha Krettek > Assignee: Felix Neutatz > Priority: Minor > Labels: starter > > The current skipFirstLine is limited. Skipping arbitrary lines that start > with a certain character would be much more flexible while still easy to > implement. -- This message was sent by Atlassian JIRA (v6.3.4#6332)