Fair enough, thanks for the explanation!

Best,
Xuwei Fu

Ryan Blue <rdb...@gmail.com> 于2025年5月13日周二 23:58写道:

> Keys are not concatenated to produce different field names for nested
> objects. In the original example, each “a” should be encoded in the value
> object using the same ID from the metadata/dictionary.
>
> Strings in the dictionary may be duplicated only if the dictionary is not
> sorted. From the spec
> <
> https://github.com/apache/parquet-format/blob/a084f844f8475e5ff190fa367815e5ef00dbe08f/VariantEncoding.md#:~:text=If%20sorted_strings%20is%20set%20to%201%2C%20strings%20in%20the%20dictionary%20must%20be%20unique%20and%20sorted%20in%20lexicographic%20order.%20If%20the%20value%20is%20set%20to%200%2C%20readers%20may%20not%20make%20any%20assumptions%20about%20string%20order%20or%20uniqueness
> .>
> :
>
> If sorted_strings is set to 1, strings in the dictionary must be unique and
> sorted in lexicographic order. If the value is set to 0, readers may not
> make any assumptions about string order or uniqueness.
>
>
> On Tue, May 13, 2025 at 2:59 AM Andrew Lamb <andrewlam...@gmail.com>
> wrote:
>
> > I think you can potentially use the example binary data here[1] to answer
> > these question, specifically [2] and [3]
> >
> > I don't think the keys are concatenated with parent key names.
> >
> > Andrew
> >
> > [1]: https://github.com/apache/parquet-testing/tree/master/variant
> > [2]:
> >
> >
> https://github.com/apache/parquet-testing/blob/master/variant/object_nested.metadata
> > [3]:
> >
> >
> https://github.com/apache/parquet-testing/blob/master/variant/object_nested.value
> >
> >
> > https://github.com/apache/parquet-testing/issues/75
> >
> > On Tue, May 13, 2025 at 4:37 AM Gang Wu <ust...@gmail.com> wrote:
> >
> > > quick question: how to serialize keys in the nested objects? Do we need
> > to
> > > concatenate its parent key like the json path?
> > >
> > > On Tue, May 13, 2025 at 3:19 PM wish maple <maplewish...@gmail.com>
> > wrote:
> > >
> > > > Just to make sure if it's ok or this should be forbidden. Since it
> > > > affect how reader/writer handles this
> > > >
> > > > Best,
> > > > Xuwei Fu
> > > >
> > > > Aihua Xu <aihu...@gmail.com> 于2025年5月13日周二 14:32写道:
> > > >
> > > > > It should be just single ‘a’ to reduce the storage by reusing the
> > same
> > > > > key. Any reason that we want to keep both ‘a’ there?
> > > > >
> > > > >
> > > > >
> > > > > > On May 12, 2025, at 7:43 PM, wish maple <maplewish...@gmail.com>
> > > > wrote:
> > > > > >
> > > > > > Thanks! So, in the nested object scenario, would the metadata be
> > > > > > field 0: "a", field 1: "a" or just field 0: "a"
> > > > > > do the both way is ok for reader/writer, or we need limit the
> > > > > > metadata implementation?
> > > > > >
> > > > > > Best,
> > > > > > Xuwei Fu
> > > > > >
> > > > > > Ryan Blue <rdb...@gmail.com> 于2025年5月13日周二 04:05写道:
> > > > > >
> > > > > >> Keys may appear in nested objects, but cannot appear in the same
> > > > > object. So
> > > > > >> the first example, {"a": {"a": 1}} is allowed. The second
> example,
> > > > > {"a": 1,
> > > > > >> "a": 2} is not allowed.
> > > > > >>
> > > > > >> Ryan
> > > > > >>
> > > > > >>> On Sun, May 11, 2025 at 11:47 PM wish maple <
> > > maplewish...@gmail.com>
> > > > > >>> wrote:
> > > > > >>>
> > > > > >>> In the Parquet variant spec, metadata part says that
> > > > > >>>
> > > > > >>>> Object: An unordered collection of string/Variant pairs (i.e.
> > > > > key/value
> > > > > >>> pairs). An object may not contain duplicate keys. [1]
> > > > > >>>
> > > > > >>> Considering a nested json object like {"a": {"a": 1}}, would
> the
> > > > > metadata
> > > > > >>> like field 0: "a", field 1: "a" or just field 0: "a" , or both
> of
> > > > them
> > > > > is
> > > > > >>> ok for reader/writer?
> > > > > >>>
> > > > > >>> And besides, would duplicate keys be allowed in the same
> object?
> > > Like
> > > > > >> {"a":
> > > > > >>> 1, "a": 2}?
> > > > > >>>
> > > > > >>> Best, Xuwei Fu
> > > > > >>>
> > > > > >>> [1]
> > > > > >>>
> > > > >
> > >
> https://github.com/apache/parquet-format/blob/master/VariantEncoding.md
> > > > > >>>
> > > > > >>
> > > > >
> > > >
> > >
> >
>

Reply via email to