Hi David,

Thanks for the feedback!

1. I think both ways can express the same semantics. I am just
following the API design from the `RowData`, where we have a method to
check null and return a primitive type.

2. It is indeed confusing, as the words Object and Map are used
interchangeably in the FLIP. An object typed Variant is the same as a
Map from key to variant. Because we don't have the notion of object in
the SQL type, therefore MAP makes more sense to describe the Variant
type.

3, 4. I am not sure if I understand your question. What do you mean by
json_object would be returned? I don't think that we have a
json_object type. If I understand correctly, JSON_OBJECT just returns
the json string. It doesn't make much sense that the PARSE_JSON accept
a Json string and return the same json string.

Best,
Xuannan


On Fri, Apr 25, 2025 at 10:14 PM David Radley <david_rad...@uk.ibm.com> wrote:
>
> Hi Xuannan,
> This looks like a good addition.
>
>
>   1.  I was wondering whether it is possible to have a type, but the value be 
> null – for example a null value in a Float type and tolerate nulls being 
> returned for float getFloat(). If so then maybe we should return an object 
> Float instead.
>   2.  You mention maps in the Flip text but do not have it has a type. I 
> wondered what your thinking is.
>   3.  In the new functions PARSE_JSON and TRY_PARSE_JSON, the text says they 
> parse to a variant. As we support JSON_OBJECT as well, there could be an 
> expectation that json_object would be the expected return type. Maybe we 
> could allow the user to choose what gets returned?
>   4.  Can variants be turned into json_objects and vice versa.
>
> Kind regards, David.
>
> From: Xuannan Su <suxuanna...@gmail.com>
> Date: Friday, 25 April 2025 at 12:47
> To: dev@flink.apache.org <dev@flink.apache.org>
> Subject: [EXTERNAL] Re: [DISCUSS] FLIP-521: Integrating Variant Type into 
> Flink: Enabling Efficient Semi-Structured Data Processing
> Hi everyone,
>
> Thank you for all the comments! If there are no further comments, I'd
> like to close the discussion and start the voting next Monday.
>
> Best,
> Xuannan
>
> On Fri, Apr 25, 2025 at 7:41 PM Lincoln Lee <lincoln.8...@gmail.com> wrote:
> >
> > +1 for this FLIP. VARIANT type support will be a great addition to sql.
> > Look forward to the detailed design of the subsequent shredding
> > optimizations.
> >
> >
> > Best,
> > Lincoln Lee
> >
> >
> > Timo Walther <twal...@apache.org> 于2025年4月22日周二 16:51写道:
> >
> > > +1 for this feature. Having a VARIANT type makes a lot of sense and
> > > together with an OBJECT type will make semi-structured data processing
> > > in Flink easier.
> > >
> > > Currently, I'm catching up with notifications after the easter holidays,
> > > but happy to give some feedback by tomorrow or Thursday as well.
> > >
> > > Thanks,
> > > Timo
> > >
> > > On 22.04.25 10:40, Jingsong Li wrote:
> > > > Thanks Xuannan for driving this discussion.
> > > >
> > > > At present, communities such as Apache Iceberg, Delta, Spark, Parquet,
> > > > etc. are all designing and developing around Variant, and our Flink
> > > > support for Variant is very valuable.
> > > >
> > > > After a rough look at the design, there is no overall problem. It is
> > > > designed around Parquet's Variant standard, which is similar to the
> > > > overall design of Spark SQL.
> > > >
> > > > +1 for this.
> > > >
> > > > Best,
> > > > Jingsong
> > > >
> > > > On Mon, Apr 14, 2025 at 6:12 PM Xuannan Su <suxuanna...@gmail.com>
> > > wrote:
> > > >>
> > > >> Hi devs,
> > > >>
> > > >> I’d like to start a discussion around FLIP-521: Integrating Variant
> > > >> Type into Flink: Enabling Efficient Semi-Structured Data
> > > >> Processing[1]. Working with semi-structured data has long been a
> > > >> foundational scenario of the Lakehouse. While JSON has traditionally
> > > >> served as the primary storage format for such data, its implementation
> > > >> as serialized strings introduces significant inefficiencies.
> > > >>
> > > >> In this FLIP, we integrate the Variant encoding, which is a compact
> > > >> binary representation of semi-structured data[2], to improve the
> > > >> performance of processing semi-structured data. As Paimon has
> > > >> supported the Variant type recently[3], this FLIP would allow Flink to
> > > >> further leverage Paimon's storage-layer optimizations, improving
> > > >> performance and resource utilization for semi-structured data
> > > >> pipelines.
> > > >>
> > > >> Best,
> > > >> Xuannan
> > > >>
> > > >> [1]
> > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-521%3A+Integrating+Variant+Type+into+Flink%3A+Enabling+Efficient+Semi-Structured+Data+Processing
> > > >> [2]
> > > https://github.com/apache/parquet-format/blob/master/VariantEncoding.md
> > > >> [3] https://github.com/apache/paimon/issues/4471
> > > >
> > >
> > >
>
> Unless otherwise stated above:
>
> IBM United Kingdom Limited
> Registered in England and Wales with number 741598
> Registered office: Building C, IBM Hursley Office, Hursley Park Road, 
> Winchester, Hampshire SO21 2JN

Reply via email to