[ https://issues.apache.org/jira/browse/FLINK-2988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15031016#comment-15031016 ]
Timo Walther commented on FLINK-2988: ------------------------------------- Yes that is a good example for a {{TableSource}} in {{TableEnvironment}}. But maybe it would also make sense to move Row to flink-core and provide an easy way for reading nullable, variable-length CSV files in DataSet API as well. Tuples and POJOs are sometimes simply too static. I also had an use-case with more than 25 columns. Defining a POJO for so many columns is quite cumbersome. > Cannot load DataSet[Row] from CSV file > -------------------------------------- > > Key: FLINK-2988 > URL: https://issues.apache.org/jira/browse/FLINK-2988 > Project: Flink > Issue Type: Improvement > Components: DataSet API, Table API > Affects Versions: 0.10.0 > Reporter: Johann Kovacs > Priority: Minor > > Tuple classes (Java/Scala both) only have arity up to 25, meaning I cannot > load a CSV file with more than 25 columns directly as a > DataSet\[TupleX\[...\]\]. > An alternative to using Tuples is using the Table API's Row class, which > allows for arbitrary-length, arbitrary-type, runtime-supplied schemata (using > RowTypeInfo) and index-based access. > However, trying to load a CSV file as a DataSet\[Row\] yields an exception: > {code} > val env = ExecutionEnvironment.createLocalEnvironment() > val filePath = "../someCsv.csv" > val typeInfo = new RowTypeInfo(Seq(BasicTypeInfo.STRING_TYPE_INFO, > BasicTypeInfo.INT_TYPE_INFO), Seq("word", "number")) > val source = env.readCsvFile(filePath)(ClassTag(classOf[Row]), typeInfo) > println(source.collect()) > {code} > with someCsv.csv containing: > {code} > one,1 > two,2 > {code} > yields > {code} > Exception in thread "main" java.lang.ClassCastException: > org.apache.flink.api.table.typeinfo.RowSerializer cannot be cast to > org.apache.flink.api.java.typeutils.runtime.TupleSerializerBase > at > org.apache.flink.api.scala.operators.ScalaCsvInputFormat.<init>(ScalaCsvInputFormat.java:46) > at > org.apache.flink.api.scala.ExecutionEnvironment.readCsvFile(ExecutionEnvironment.scala:282) > {code} > As a user I would like to be able to load a CSV file into a DataSet\[Row\], > preferably having a convenience method to specify the schema (RowTypeInfo), > without having to use the "explicit implicit parameters" syntax and > specifying the ClassTag. -- This message was sent by Atlassian JIRA (v6.3.4#6332)