[
https://issues.apache.org/jira/browse/FLINK-2988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Fabian Hueske closed FLINK-2988.
--------------------------------
Resolution: Duplicate
FLINK-3901 is about adding a CsvInputFormat for the Row data type. Since there
is a PR for FLINK-3901, I close this issue as duplicate.
> Cannot load DataSet[Row] from CSV file
> --------------------------------------
>
> Key: FLINK-2988
> URL: https://issues.apache.org/jira/browse/FLINK-2988
> Project: Flink
> Issue Type: Improvement
> Components: DataSet API, Table API & SQL
> Affects Versions: 0.10.0
> Reporter: Johann Kovacs
> Priority: Minor
>
> Tuple classes (Java/Scala both) only have arity up to 25, meaning I cannot
> load a CSV file with more than 25 columns directly as a
> DataSet\[TupleX\[...\]\].
> An alternative to using Tuples is using the Table API's Row class, which
> allows for arbitrary-length, arbitrary-type, runtime-supplied schemata (using
> RowTypeInfo) and index-based access.
> However, trying to load a CSV file as a DataSet\[Row\] yields an exception:
> {code}
> val env = ExecutionEnvironment.createLocalEnvironment()
> val filePath = "../someCsv.csv"
> val typeInfo = new RowTypeInfo(Seq(BasicTypeInfo.STRING_TYPE_INFO,
> BasicTypeInfo.INT_TYPE_INFO), Seq("word", "number"))
> val source = env.readCsvFile(filePath)(ClassTag(classOf[Row]), typeInfo)
> println(source.collect())
> {code}
> with someCsv.csv containing:
> {code}
> one,1
> two,2
> {code}
> yields
> {code}
> Exception in thread "main" java.lang.ClassCastException:
> org.apache.flink.api.table.typeinfo.RowSerializer cannot be cast to
> org.apache.flink.api.java.typeutils.runtime.TupleSerializerBase
> at
> org.apache.flink.api.scala.operators.ScalaCsvInputFormat.<init>(ScalaCsvInputFormat.java:46)
> at
> org.apache.flink.api.scala.ExecutionEnvironment.readCsvFile(ExecutionEnvironment.scala:282)
> {code}
> As a user I would like to be able to load a CSV file into a DataSet\[Row\],
> preferably having a convenience method to specify the schema (RowTypeInfo),
> without having to use the "explicit implicit parameters" syntax and
> specifying the ClassTag.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)