Can you give some simple examples to demonstrate the problem? I think the
inconsistency would bring problems but don't know how.

On Fri, May 8, 2020 at 3:49 PM Jungtaek Lim <kabhwan.opensou...@gmail.com>
wrote:

> (bump to expose the discussion to more readers)
>
> On Mon, May 4, 2020 at 4:57 PM Jungtaek Lim <kabhwan.opensou...@gmail.com>
> wrote:
>
>> Hi devs,
>>
>> There're couple of issues being reported on the user@ mailing list which
>> results in being affected by inconsistent schema on Encoders.bean.
>>
>> 1. Typed datataset from Avro generated classes? [1]
>> 2. spark structured streaming GroupState returns weird values from sate
>> [2]
>>
>> Below is a part of JavaTypeInference.inferDataType() which handles beans:
>>
>>
>> https://github.com/apache/spark/blob/f72220b8ab256e8e6532205a4ce51d50b69c26e9/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/JavaTypeInference.scala#L139-L157
>>
>> it collects properties based on the availability of getter.
>>
>> (It's applied as well as `SQLContext.beansToRows`.)
>>
>> JavaTypeInference.serializerFor() and JavaTypeInference.deserializerFor()
>> aren't. They collect properties based on the available of both getter and
>> setter.
>> (It calls JavaTypeInference.inferDataType() inside the method, making
>> inconsistent even only these method is called.)
>>
>> This inconsistent produces runtime issues when Java bean only has getter
>> for some fields, even there's no such field for the getter method - as
>> getter/setter methods are determined by naming convention.
>>
>> I feel this is something we should fix, but would like to see opinions on
>> how to fix it. If the user query has the problematic beans but hasn't
>> encountered such issue, fixing the issue would drop off some columns, which
>> would be backward incompatible. I think this is still the way to go, but if
>> we concern more on not breaking existing query, we may want to at least
>> document the ideal form of the bean Spark expects.
>>
>> Would like to hear opinions on this.
>>
>> Thanks,
>> Jungtaek Lim (HeartSaVioR)
>>
>> 1.
>> https://lists.apache.org/thread.html/r8f8e680e02955cdf05b4dd34c60a9868288fd10a03f1b1b8627f3d84%40%3Cuser.spark.apache.org%3E
>> 2.
>> http://mail-archives.apache.org/mod_mbox/spark-user/202003.mbox/%3ccafx8l21dzbyv5m1qozs3y+pcmycwbtjko6ytwvkydztq7u4...@mail.gmail.com%3e
>>
>

Reply via email to