Fair enough, thanks for the explanation! Best, Xuwei Fu
Ryan Blue <rdb...@gmail.com> 于2025年5月13日周二 23:58写道: > Keys are not concatenated to produce different field names for nested > objects. In the original example, each “a” should be encoded in the value > object using the same ID from the metadata/dictionary. > > Strings in the dictionary may be duplicated only if the dictionary is not > sorted. From the spec > < > https://github.com/apache/parquet-format/blob/a084f844f8475e5ff190fa367815e5ef00dbe08f/VariantEncoding.md#:~:text=If%20sorted_strings%20is%20set%20to%201%2C%20strings%20in%20the%20dictionary%20must%20be%20unique%20and%20sorted%20in%20lexicographic%20order.%20If%20the%20value%20is%20set%20to%200%2C%20readers%20may%20not%20make%20any%20assumptions%20about%20string%20order%20or%20uniqueness > .> > : > > If sorted_strings is set to 1, strings in the dictionary must be unique and > sorted in lexicographic order. If the value is set to 0, readers may not > make any assumptions about string order or uniqueness. > > > On Tue, May 13, 2025 at 2:59 AM Andrew Lamb <andrewlam...@gmail.com> > wrote: > > > I think you can potentially use the example binary data here[1] to answer > > these question, specifically [2] and [3] > > > > I don't think the keys are concatenated with parent key names. > > > > Andrew > > > > [1]: https://github.com/apache/parquet-testing/tree/master/variant > > [2]: > > > > > https://github.com/apache/parquet-testing/blob/master/variant/object_nested.metadata > > [3]: > > > > > https://github.com/apache/parquet-testing/blob/master/variant/object_nested.value > > > > > > https://github.com/apache/parquet-testing/issues/75 > > > > On Tue, May 13, 2025 at 4:37 AM Gang Wu <ust...@gmail.com> wrote: > > > > > quick question: how to serialize keys in the nested objects? Do we need > > to > > > concatenate its parent key like the json path? > > > > > > On Tue, May 13, 2025 at 3:19 PM wish maple <maplewish...@gmail.com> > > wrote: > > > > > > > Just to make sure if it's ok or this should be forbidden. Since it > > > > affect how reader/writer handles this > > > > > > > > Best, > > > > Xuwei Fu > > > > > > > > Aihua Xu <aihu...@gmail.com> 于2025年5月13日周二 14:32写道: > > > > > > > > > It should be just single ‘a’ to reduce the storage by reusing the > > same > > > > > key. Any reason that we want to keep both ‘a’ there? > > > > > > > > > > > > > > > > > > > > > On May 12, 2025, at 7:43 PM, wish maple <maplewish...@gmail.com> > > > > wrote: > > > > > > > > > > > > Thanks! So, in the nested object scenario, would the metadata be > > > > > > field 0: "a", field 1: "a" or just field 0: "a" > > > > > > do the both way is ok for reader/writer, or we need limit the > > > > > > metadata implementation? > > > > > > > > > > > > Best, > > > > > > Xuwei Fu > > > > > > > > > > > > Ryan Blue <rdb...@gmail.com> 于2025年5月13日周二 04:05写道: > > > > > > > > > > > >> Keys may appear in nested objects, but cannot appear in the same > > > > > object. So > > > > > >> the first example, {"a": {"a": 1}} is allowed. The second > example, > > > > > {"a": 1, > > > > > >> "a": 2} is not allowed. > > > > > >> > > > > > >> Ryan > > > > > >> > > > > > >>> On Sun, May 11, 2025 at 11:47 PM wish maple < > > > maplewish...@gmail.com> > > > > > >>> wrote: > > > > > >>> > > > > > >>> In the Parquet variant spec, metadata part says that > > > > > >>> > > > > > >>>> Object: An unordered collection of string/Variant pairs (i.e. > > > > > key/value > > > > > >>> pairs). An object may not contain duplicate keys. [1] > > > > > >>> > > > > > >>> Considering a nested json object like {"a": {"a": 1}}, would > the > > > > > metadata > > > > > >>> like field 0: "a", field 1: "a" or just field 0: "a" , or both > of > > > > them > > > > > is > > > > > >>> ok for reader/writer? > > > > > >>> > > > > > >>> And besides, would duplicate keys be allowed in the same > object? > > > Like > > > > > >> {"a": > > > > > >>> 1, "a": 2}? > > > > > >>> > > > > > >>> Best, Xuwei Fu > > > > > >>> > > > > > >>> [1] > > > > > >>> > > > > > > > > > https://github.com/apache/parquet-format/blob/master/VariantEncoding.md > > > > > >>> > > > > > >> > > > > > > > > > > > > > > >