I'm using the UDT api to work with a custom Money datatype in dataframes. heres how i have it setup
class StringUDT(UserDefinedType): @classmethod def sqlType(self): return StringType() @classmethod def module(cls): return cls.__module__ @classmethod def scalaUDT(cls): return '' def serialize(self, obj): return str(obj) def deserialize(self, datum): return Money(datum) class MoneyUDT(StringUDT): pass Money.__UDT__ = MoneyUDT() I then create a DataFrame like so df = sc.sql.createDataFrame([[Money("25.0")], [Money("100.0")]], spark_schema) However i've run into a few snags with this. DFs created using this UDT can not be orderedBy the UDT column and i can't Union two DFs that have this UDT on one of their columns. Is this expected behaviour ? or is my UDT setup wrong ?.