[jira] [Comment Edited] (FLINK-2988) Cannot load DataSet[Row] from CSV file

GaoLun (JIRA) Tue, 24 Nov 2015 17:45:48 -0800

    [ 
https://issues.apache.org/jira/browse/FLINK-2988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15023998#comment-15023998
 ]


GaoLun edited comment on FLINK-2988 at 11/25/15 1:45 AM:
---------------------------------------------------------

Hi, [~jkovacs] . Is this problem solved?
IMO, readCsvFile can only support the *tuple* and *POJO* class which wouldn't 
support nullable fields. While reading the csv file, env created the 
tupleSerializer. RowSerializer can create certain serializer according to the 
field type. Row can also support null value which tuple can't.


was (Author: gallenvara_bg):
Hi, [~jkovacs] . Is this problem solved?
IMO, *readCsvFile* can only support the *tuple* and *POJO* class which wouldn't 
support nullable fields. While reading the csv file, *env* created the 
*tupleSerializer*. *RowSerializer* can create certain *serializer* according to 
the field type. *Row* can also support null value which *tuple* can't.

> Cannot load DataSet[Row] from CSV file
> --------------------------------------
>
>                 Key: FLINK-2988
>                 URL: https://issues.apache.org/jira/browse/FLINK-2988
>             Project: Flink
>          Issue Type: Improvement
>          Components: DataSet API, Table API
>    Affects Versions: 0.10.0
>            Reporter: Johann Kovacs
>            Priority: Minor
>
> Tuple classes (Java/Scala both) only have arity up to 25, meaning I cannot 
> load a CSV file with more than 25 columns directly as a 
> DataSet\[TupleX\[...\]\].
> An alternative to using Tuples is using the Table API's Row class, which 
> allows for arbitrary-length, arbitrary-type, runtime-supplied schemata (using 
> RowTypeInfo) and index-based access.
> However, trying to load a CSV file as a DataSet\[Row\] yields an exception:
> {code}
> val env = ExecutionEnvironment.createLocalEnvironment()
> val filePath = "../someCsv.csv"
> val typeInfo = new RowTypeInfo(Seq(BasicTypeInfo.STRING_TYPE_INFO, 
> BasicTypeInfo.INT_TYPE_INFO), Seq("word", "number"))
> val source = env.readCsvFile(filePath)(ClassTag(classOf[Row]), typeInfo)
> println(source.collect())
> {code}
> with someCsv.csv containing:
> {code}
> one,1
> two,2
> {code}
> yields
> {code}
> Exception in thread "main" java.lang.ClassCastException: 
> org.apache.flink.api.table.typeinfo.RowSerializer cannot be cast to 
> org.apache.flink.api.java.typeutils.runtime.TupleSerializerBase
>       at 
> org.apache.flink.api.scala.operators.ScalaCsvInputFormat.<init>(ScalaCsvInputFormat.java:46)
>       at 
> org.apache.flink.api.scala.ExecutionEnvironment.readCsvFile(ExecutionEnvironment.scala:282)
> {code}
> As a user I would like to be able to load a CSV file into a DataSet\[Row\], 
> preferably having a convenience method to specify the schema (RowTypeInfo), 
> without having to use the "explicit implicit parameters" syntax and 
> specifying the ClassTag.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (FLINK-2988) Cannot load DataSet[Row] from CSV file

Reply via email to