> > I haven't looked at Encoders or Datasets since we're bound to 1.6 for now > but I'll look at encoders to see if that covers it. Datasets seems like it > would solve this problem for sure. >
There is an experimental preview of Datasets in Spark 1.6 > I avoided returning a case object because even if we use reflection to > build byte code and do it efficiently. I still need to convert my Row to a > case object manually within my UDF, just to have it converted to a Row > again. Even if it's fast, it's still fairly necessary. > Even if you give us a Row there's still a conversion into the binary format of InternalRow > The thing I guess that threw me off was that UDF1/2/3 was in a "java" > prefixed package although there was nothing that made it java specific and > in fact was the only way to do what I wanted in scala. For things like > JavaRDD, etc it makes sense, but for generic things like UDF is there a > reason they get put into a package with "java" in the name? > This was before we decided to unify the APIs for Scala and Java, so its mostly historical.