Hi Weston,

I've quickly searched our source code and the format about field_id because
I have no experience with it. It seems that we write/read field_id between
the file footer and our internal schema structure. You are also able to set
field id when you either use our schema builder or the schema parser. We
also support field_id during the thrift and parquet schema conversion.
Meanwhile, it seems we do not make any business decision based on field_id.
For example neither schema merge nor filtering support field ids.

So, I would not say Parquet is the best example (for now) to help you with
field ids. Maybe, Apache Iceberg <https://iceberg.apache.org/spec/> would
be a better one since their specification explains it (unlike Parquet).

Regards,
Gabor

On Tue, May 18, 2021 at 6:20 AM Weston Pace <[email protected]> wrote:

> Hi dev,
>
> I'm Weston, I've been working on the Arrow project lately.   As the
> Arrow project implements more transformations of data I've been
> wondering how we should treat the field_id property.  For some
> concrete examples:
>
>  * Filtering a table by column (it seems the field_id should remain
> unchanged)
>  * Filtering a table by rows (it seems the field_id should remain
> unchanged)
>  * Filling in null values with a placeholder value (the data is changed so
> ???)
>  * Casting a field to a different data type (the meaning of the data
> has changed so ???)
>  * Combining two fields into a third field (it seems the third field
> should have no field_id)
>
> I'm reaching out to the Parquet community to solicit input as you have
> expertise/experience around the motivation behind the field_id
> property and its uses.
>
> Thanks,
>
> -Weston Pace
>

Reply via email to