Hi everyone,I'm currently trying to create a generic transformation mecanism on
a Dataframe to modify an arbitrary column regardless of the underlying the
schema.
It's "relatively" straightforward for complex types like struct<struct<…>> to
apply an arbitrary UDF on the column and replace the data "inside" the struct,
however I'm struggling to make it work for complex types containing arrays along
the way like struct<array<struct<…>>>.
Michael Armbrust seemed to allude on the mailing list/forum to a way of using
Encoders to do that, I'd be interested in any pointers, especially considering
that it's not possible to output any Row or GenericRowWithSchema from a UDF
(thanks to
https://github.com/apache/spark/blob/v2.0.0/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala#L657
it seems).
To sum up, I'd like to find a way to apply a transformation on complex nested
datatypes (arrays and struct) on a Dataframe updating the value itself.
Regards,
Olivier Girardot

Reply via email to