Can you give some simple examples to demonstrate the problem? I think the inconsistency would bring problems but don't know how.
On Fri, May 8, 2020 at 3:49 PM Jungtaek Lim <kabhwan.opensou...@gmail.com> wrote: > (bump to expose the discussion to more readers) > > On Mon, May 4, 2020 at 4:57 PM Jungtaek Lim <kabhwan.opensou...@gmail.com> > wrote: > >> Hi devs, >> >> There're couple of issues being reported on the user@ mailing list which >> results in being affected by inconsistent schema on Encoders.bean. >> >> 1. Typed datataset from Avro generated classes? [1] >> 2. spark structured streaming GroupState returns weird values from sate >> [2] >> >> Below is a part of JavaTypeInference.inferDataType() which handles beans: >> >> >> https://github.com/apache/spark/blob/f72220b8ab256e8e6532205a4ce51d50b69c26e9/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/JavaTypeInference.scala#L139-L157 >> >> it collects properties based on the availability of getter. >> >> (It's applied as well as `SQLContext.beansToRows`.) >> >> JavaTypeInference.serializerFor() and JavaTypeInference.deserializerFor() >> aren't. They collect properties based on the available of both getter and >> setter. >> (It calls JavaTypeInference.inferDataType() inside the method, making >> inconsistent even only these method is called.) >> >> This inconsistent produces runtime issues when Java bean only has getter >> for some fields, even there's no such field for the getter method - as >> getter/setter methods are determined by naming convention. >> >> I feel this is something we should fix, but would like to see opinions on >> how to fix it. If the user query has the problematic beans but hasn't >> encountered such issue, fixing the issue would drop off some columns, which >> would be backward incompatible. I think this is still the way to go, but if >> we concern more on not breaking existing query, we may want to at least >> document the ideal form of the bean Spark expects. >> >> Would like to hear opinions on this. >> >> Thanks, >> Jungtaek Lim (HeartSaVioR) >> >> 1. >> https://lists.apache.org/thread.html/r8f8e680e02955cdf05b4dd34c60a9868288fd10a03f1b1b8627f3d84%40%3Cuser.spark.apache.org%3E >> 2. >> http://mail-archives.apache.org/mod_mbox/spark-user/202003.mbox/%3ccafx8l21dzbyv5m1qozs3y+pcmycwbtjko6ytwvkydztq7u4...@mail.gmail.com%3e >> >