[
https://issues.apache.org/jira/browse/SPARK-29009?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Hyukjin Kwon resolved SPARK-29009.
----------------------------------
Resolution: Invalid
Resolving due to no feedback from its author.
> Returning pojo from udf not working
> -----------------------------------
>
> Key: SPARK-29009
> URL: https://issues.apache.org/jira/browse/SPARK-29009
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 2.4.3
> Reporter: Tomasz Belina
> Priority: Major
>
> It looks like spark is unable to construct row from pojo returned from udf.
> Give POJO:
> {code:java}
> public class SegmentStub {
> private int id;
> private Date statusDateTime;
> private int healthPointRatio;
> }
> {code}
> Registration of the UDF:
> {code:java}
> public class ParseResultsUdf {
> public String registerUdf(SparkSession sparkSession) {
> Encoder<SegmentStub> encoder = Encoders.bean(SegmentStub.class);
> final StructType schema = encoder.schema();
> sparkSession.udf().register(UDF_NAME,
> (UDF2<String, String, SegmentStub>) (s, s2) -> new
> SegmentStub(1, Date.valueOf(LocalDate.now()), 2),
> schema
> );
> return UDF_NAME;
> }
> }
> {code}
> Test code:
> {code:java}
> List<String[]> strings = Arrays.asList(new String[]{"one", "two"},new
> String[]{"3", "4"});
> JavaRDD<Row> rowJavaRDD =
> sparkContext.parallelize(strings).map(RowFactory::create);
> StructType schema = DataTypes
> .createStructType(new StructField[] {
> DataTypes.createStructField("foe1", DataTypes.StringType, false),
> DataTypes.createStructField("foe2",
> DataTypes.StringType, false) });
> Dataset<Row> dataFrame =
> sparkSession.sqlContext().createDataFrame(rowJavaRDD, schema);
> Seq<Column> columnSeq = new Set.Set2<>(col("foe1"),
> col("foe2")).toSeq();
> dataFrame.select(callUDF(udfName, columnSeq)).show();
> {code}
> throws exception:
> {code:java}
> Caused by: java.lang.IllegalArgumentException: The value (SegmentStub(id=1,
> statusDateTime=2019-09-06, healthPointRatio=2)) of the type (udf.SegmentStub)
> cannot be converted to struct<healthPointRatio:int,id:int,statusDateTime:date>
> at
> org.apache.spark.sql.catalyst.CatalystTypeConverters$StructConverter.toCatalystImpl(CatalystTypeConverters.scala:262)
> at
> org.apache.spark.sql.catalyst.CatalystTypeConverters$StructConverter.toCatalystImpl(CatalystTypeConverters.scala:238)
> at
> org.apache.spark.sql.catalyst.CatalystTypeConverters$CatalystTypeConverter.toCatalyst(CatalystTypeConverters.scala:103)
> at
> org.apache.spark.sql.catalyst.CatalystTypeConverters$$anonfun$createToCatalystConverter$2.apply(CatalystTypeConverters.scala:396)
> ... 21 more
> }
> {code}
>
--
This message was sent by Atlassian Jira
(v8.3.2#803003)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]