Thank you for looking into it.  This sounds very similar to Arrow's
usage of field_id (passing it through but making no business
decisions).  I appreciate the advice and I have reached out to the
Iceberg team.

On Tue, May 18, 2021 at 12:42 AM Gabor Szadovszky <[email protected]> wrote:
>
> Hi Weston,
>
> I've quickly searched our source code and the format about field_id because
> I have no experience with it. It seems that we write/read field_id between
> the file footer and our internal schema structure. You are also able to set
> field id when you either use our schema builder or the schema parser. We
> also support field_id during the thrift and parquet schema conversion.
> Meanwhile, it seems we do not make any business decision based on field_id.
> For example neither schema merge nor filtering support field ids.
>
> So, I would not say Parquet is the best example (for now) to help you with
> field ids. Maybe, Apache Iceberg <https://iceberg.apache.org/spec/> would
> be a better one since their specification explains it (unlike Parquet).
>
> Regards,
> Gabor
>
> On Tue, May 18, 2021 at 6:20 AM Weston Pace <[email protected]> wrote:
>
> > Hi dev,
> >
> > I'm Weston, I've been working on the Arrow project lately.   As the
> > Arrow project implements more transformations of data I've been
> > wondering how we should treat the field_id property.  For some
> > concrete examples:
> >
> >  * Filtering a table by column (it seems the field_id should remain
> > unchanged)
> >  * Filtering a table by rows (it seems the field_id should remain
> > unchanged)
> >  * Filling in null values with a placeholder value (the data is changed so
> > ???)
> >  * Casting a field to a different data type (the meaning of the data
> > has changed so ???)
> >  * Combining two fields into a third field (it seems the third field
> > should have no field_id)
> >
> > I'm reaching out to the Parquet community to solicit input as you have
> > expertise/experience around the motivation behind the field_id
> > property and its uses.
> >
> > Thanks,
> >
> > -Weston Pace
> >

Reply via email to