I'm using the UDT api to work with a custom Money datatype in dataframes.
heres how i have it setup
class StringUDT(UserDefinedType):
@classmethod
def sqlType(self):
return StringType()
@classmethod
def module(cls):
return cls.__module__
@classmethod
def scalaUDT(cls):
return ''
def serialize(self, obj):
return str(obj)
def deserialize(self, datum):
return Money(datum)
class MoneyUDT(StringUDT):
pass
Money.__UDT__ = MoneyUDT()
I then create a DataFrame like so
df = sc.sql.createDataFrame([[Money("25.0")], [Money("100.0")]], spark_schema)
However i've run into a few snags with this. DFs created using this
UDT can not be orderedBy the UDT column and i can't Union two DFs that
have this UDT on one of their columns.
Is this expected behaviour ? or is my UDT setup wrong ?.