Re: [DISCUSS] INT96 stats

Ed Seidl Thu, 24 Jul 2025 08:46:54 -0700

If INT96 is to remain deprecated, I'd prefer 1. If we want a defined ordering 
for INT96 I'd prefer 3 to maintaining a "known good" list.


As to the forward compatibility issue with rust, that's already an issue with 
logical types (and any other unions in the spec). We're currently trying to 
work that [1].

Cheers,
Ed

[1] https://github.com/apache/arrow-rs/issues/7909

On 2025/07/24 08:19:13 Gang Wu wrote:
> For 1 and 2, do we need to maintain an allow-list for known writer
> implementations
> as well as their versions officially? My feeling is no. Perhaps it is the
> responsibility
> of interesting implementations to maintain it internally because many
> projects may
> not even care about INT96 stats.
> 
> For 3, I think it is a bug of implementations who fail on new column order.
> If we want
> to move forward [1] by adding a new column order for IEEE754 total order,
> this bug
> should be fixed anyway.
> 
> [1] https://github.com/apache/parquet-format/pull/221
> 
> On Thu, Jul 24, 2025 at 1:30 AM Micah Kornfield <[email protected]>
> wrote:
> 
> > Just to follow up on this, I think the last issues remaining are updating
> > the spec.
> >
> > There is already a draft PR (
> > https://github.com/apache/parquet-format/pull/504) for updating the spec.
> >
> > I think there are three main options:
> > 1.  Keep ordering for int96 undefined with an implementation note (the
> > current PR does this).
> > 2.  Formalize ordering as now defined using the timestamp ordering.
> > 3.  Formalize ordering as now defined using the timestamp ordering and
> > define a new SortOrder required for writers/readers to use stats.
> >
> > The main trade-offs are for options 1 and 2, we potentially need to allow
> > list implementations that are known to produce valid stats (e.g. older
> > versions of Rust were writing stats that didn't conform to Timestamp
> > ordering).
> >
> > For item #3, the main issue is that not all readers might be forward
> > compatible for a new sort order.  In particular Rust readers would break on
> > any new files [1].
> >
> > Given this I suggest we move forward with the currently opened PR and not
> > officially formalize this in th spec.  Implementations will need to
> > allow-list for known good writers.
> >
> > Thanks,
> > Micah
> >
> >
> > [1] https://github.com/apache/arrow-rs/issues/7909
> >
> >
> >
> > On Mon, Jun 30, 2025 at 8:55 AM Alkis Evlogimenos
> > <[email protected]> wrote:
> >
> > > I also checked internally with the Spark OSS team and the plan for having
> > > INT64 timestamps in Spark by default is to make the change when Delta v5
> > > and Iceberg v4 are proposed. This is expected to happen around the first
> > > half of 2026.
> > >
> > > On Wed, Jun 25, 2025 at 8:41 PM Andrew Lamb <[email protected]>
> > > wrote:
> > >
> > > > We had a good discussion about this at the sync today.  Here is my
> > > summary
> > > >
> > > > * Pedantically, according to the current spec[1] there is no defined
> > > > ordering for Int96 types and thus arrow-rs can not be writing
> > "incorrect"
> > > > values (as there is no definition of correct)
> > > > * Practically speaking, arrow-rs is writing something different than
> > > Photon
> > > > (Databricks proprietary spark engine)
> > > > * What Photon is doing arguably makes more sense (to use the ordering
> > of
> > > > the only logical type to use Int96)
> > > > * GH-7686: [Parquet] Fix int96 min/max stats #7687[2] brings arrow-rs
> > > into
> > > > line with Photon which makes sense to me
> > > >
> > > > Rahul has also filed a ticket in parquet-format to discuss formalizing
> > > the
> > > > ordering of Int96 statistics[3]
> > > >
> > > > In the interim, I filed a PR[4] in the parquet-format repo to at least
> > > try
> > > > and clarify the intent of the changes to arrow-rs and parquet-java
> > > >
> > > > Thanks,
> > > > Andrew
> > > >
> > > >
> > > > [1]:
> > > >
> > > >
> > >
> > https://github.com/apache/parquet-format/blob/cf943c197f4fad826b14ba0c40eb0ffdab585285/src/main/thrift/parquet.thrift#L1079
> > > > [2]: https://github.com/apache/arrow-rs/pull/7687
> > > > [3]: https://github.com/apache/parquet-format/issues/502
> > > > [4]: https://github.com/apache/parquet-format/pull/504
> > > >
> > > >
> > > > On Wed, Jun 25, 2025 at 10:52 AM Rahul Sharma
> > > > <[email protected]> wrote:
> > > >
> > > > > I have prepared a doc
> > > > > <
> > > > >
> > > >
> > >
> > https://docs.google.com/document/d/1Ox0qHYBgs_3-pNqn9V8zVQm_W6qP0lsbd2XwQnQVz1Y/edit?tab=t.0
> > > > > >
> > > > > to summarize and have all the relevant links in one place.
> > > > >
> > > > > On Wed, Jun 25, 2025 at 1:32 PM Alkis Evlogimenos
> > > > > <[email protected]> wrote:
> > > > >
> > > > > > Spark needs to start writing INT64 nanos first to be able to
> > replace
> > > > > INT96
> > > > > > which is in nanos if data is at nano granularity. This is why I
> > > linked
> > > > > that
> > > > > > ticket which is a prerequisite to switching to INT64 in many cases.
> > > > > >
> > > > > > I understand the concerns around changing a deprecated aspect of
> > the
> > > > > > parquet spec. The reason we decided to bring this forward is
> > because:
> > > > > > 1. there are a lot of parquet files with the right INT96 stats
> > > outthere
> > > > > > (Photon has been writing them for years)
> > > > > > 2. all engines ignore the INT96 stats so Photon writing them didn't
> > > > break
> > > > > > anyone
> > > > > > 3. Spark is (slowly) moving away from INT96
> > > > > > 4. our change is very narrow, backwards compatible and can improve
> > > > > current
> > > > > > workloads while (3) is ongoing
> > > > > >
> > > > > > Let's discuss more at the sync tonight.
> > > > > >
> > > > > > > If we are going to standardize an ordering for INT96, rather than
> > > > > parsing
> > > > > > "created_by" fields, wouldn't it make more sense to add a new
> > > > ColumnOrder
> > > > > > value (like what's proposed for PARQUET-2249 [1])? Then we don't
> > need
> > > > to
> > > > > > maintain a list of known good writers.
> > > > > >
> > > > > > We do not have to add another ColumnOrder value since INT96 is a
> > > > > *physical*
> > > > > > type and can only take timestamps in the specified format. This was
> > > > > > arguably a design wart as it should have been a
> > > > FIXED_LEN_BYTE_ARRAY(12)
> > > > > > with logical type INT96_TIMESTAMP, for which a different
> > ColumnOrder
> > > > > would
> > > > > > make sense. In this case we are lucky this is a physical type
> > without
> > > > > > logical type attached because otherwise, we couldn't have made this
> > > > > change
> > > > > > in a backwards compatible way as easily.
> > > > > >
> > > > > > On Sat, Jun 21, 2025 at 12:57 AM Ed Seidl <[email protected]>
> > > wrote:
> > > > > >
> > > > > > > If we are going to standardize an ordering for INT96, rather than
> > > > > parsing
> > > > > > > "created_by" fields, wouldn't it make more sense to add a new
> > > > > ColumnOrder
> > > > > > > value (like what's proposed for PARQUET-2249 [1])? Then we don't
> > > need
> > > > > to
> > > > > > > maintain a list of known good writers.
> > > > > > >
> > > > > > > Ed
> > > > > > >
> > > > > > > [1] https://github.com/apache/parquet-format/pull/221
> > > > > > >
> > > > > > > On 2025/06/19 10:15:13 Andrew Lamb wrote:
> > > > > > > > > While INT96 is now deprecated, it's still the default
> > timestamp
> > > > > type
> > > > > > in
> > > > > > > > > Spark, resulting in a significant amount of existing data
> > > written
> > > > > in
> > > > > > > this
> > > > > > > > > format.
> > > > > > > >
> > > > > > > > I agree with Gang and Antoine that the better solution is to
> > > change
> > > > > > Spark
> > > > > > > > to write non deprecated parquet data types.
> > > > > > > >
> > > > > > > > It seems there is an issue in the Spark JIRA to do this[1] but
> > > the
> > > > > only
> > > > > > > > feedback on the associated PR [2] is that it is a breaking
> > > change.
> > > > > > > >
> > > > > > > > If Spark is going to keep writing INT96 timestamps
> > indefinitely,
> > > I
> > > > > > > suggest
> > > > > > > > we un-deprecate the INT96 timestamps to reflect the ecosystem
> > > > reality
> > > > > > > that
> > > > > > > > they will be here for a while rather than pretending they are
> > > > really
> > > > > > > > deprecated.
> > > > > > > >
> > > > > > > > Andrew
> > > > > > > >
> > > > > > > > [1]: https://issues.apache.org/jira/browse/SPARK-51359
> > > > > > > > [2]:
> > > > > > https://github.com/apache/spark/pull/50215#issuecomment-2715147840
> > > > > > > >
> > > > > > > > p.s. as an aside, is anyone from DataBricks pushing spark to
> > > change
> > > > > > > > timestamp type? Or will the focus be to  improve INT96
> > timestamps
> > > > > > > instead?
> > > > > > > >
> > > > > > > >
> > > > > > > > On Wed, Jun 18, 2025 at 10:50 PM Gang Wu <[email protected]>
> > > wrote:
> > > > > > > >
> > > > > > > > > It seems not adding too much value to improve a deprecated
> > > > feature
> > > > > > > > > especially
> > > > > > > > > when there are abundant Parquet implementations in the wild.
> > > > IIRC,
> > > > > > > > > parquet-java
> > > > > > > > > is planning to release 1.16.0 for new data types like variant
> > > and
> > > > > > > geometry.
> > > > > > > > > It is
> > > > > > > > > also the last version to support Java 8. All deprecated APIs
> > > > might
> > > > > > get
> > > > > > > > > removed
> > > > > > > > > from 2.0.0 so I'm not sure if older Spark versions are able
> > to
> > > > > > > leverage the
> > > > > > > > > int96
> > > > > > > > > stats. The right way to go is to push forward the adoption of
> > > > > > timestamp
> > > > > > > > > logical
> > > > > > > > > types.
> > > > > > > > >
> > > > > > > > > Best,
> > > > > > > > > Gang
> > > > > > > > >
> > > > > > > > > On Thu, Jun 19, 2025 at 12:31 AM Micah Kornfield <
> > > > > > > [email protected]>
> > > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Hi Alkis,
> > > > > > > > > > Is this the right thread link?  It seems to be a discussion
> > > on
> > > > > > > Timestamp
> > > > > > > > > > Nano support (which IIUC won't use int96, but I'm not sure
> > > this
> > > > > > > covers
> > > > > > > > > > changing the behavior for existing timestamps, which I
> > think
> > > > are
> > > > > at
> > > > > > > > > either
> > > > > > > > > > millisecond or microsecond granularity)?
> > > > > > > > > >
> > > > > > > > > > there will be customers that want to interface with legacy
> > > > > systems
> > > > > > > > > > > with INT96. This is why we decided in doing both.
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > It might help to elaborate on the time-frame here.  Since
> > it
> > > > > > appears
> > > > > > > > > > reference implementations of parquet are not currently
> > > writing
> > > > > > > > > statistics,
> > > > > > > > > > if we merge these changes when they will be picked up in
> > > Spark?
> > > > > > > Would the
> > > > > > > > > > plan be to backport the parquet-java to older version of
> > > Spark
> > > > > > > (otherwise
> > > > > > > > > > the legacy systems wouldn't really make use or emit stats
> > > > > anyways)?
> > > > > > > What
> > > > > > > > > > is the delta between Spark picking up these changes and
> > > > > > > transitioning off
> > > > > > > > > > of Int96 by default?   Is the expectation that even once
> > the
> > > > > > default
> > > > > > > is
> > > > > > > > > > changed in spark to not use int96, there will be a large
> > > number
> > > > > of
> > > > > > > users
> > > > > > > > > > that will override the default to write int96?
> > > > > > > > > >
> > > > > > > > > > Thanks,
> > > > > > > > > > Micah
> > > > > > > > > >
> > > > > > > > > > On Wed, Jun 18, 2025 at 1:35 AM Alkis Evlogimenos
> > > > > > > > > > <[email protected]> wrote:
> > > > > > > > > >
> > > > > > > > > > > We are also driving that in parallel:
> > > > > > > > > > >
> > > > > https://lists.apache.org/thread/y2vzrjl1499j5dvbpg3m81jxdhf4b6of
> > > > > > .
> > > > > > > > > > >
> > > > > > > > > > > Even when Spark defaults to INT64 there will be old
> > > versions
> > > > of
> > > > > > > Spark
> > > > > > > > > > > running, there will be customers that want to interface
> > > with
> > > > > > legacy
> > > > > > > > > > systems
> > > > > > > > > > > with INT96. This is why we decided in doing both.
> > > > > > > > > > >
> > > > > > > > > > > On Wed, Jun 18, 2025 at 9:53 AM Antoine Pitrou <
> > > > > > [email protected]
> > > > > > > >
> > > > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > Can we get Spark to stop emitting INT96? They are not
> > > being
> > > > > an
> > > > > > > > > > > > extremely good community player here.
> > > > > > > > > > > >
> > > > > > > > > > > > Regards
> > > > > > > > > > > >
> > > > > > > > > > > > Antoine.
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > On Fri, 13 Jun 2025 15:17:51 +0200
> > > > > > > > > > > > Alkis Evlogimenos
> > > > > > > > > > > > <[email protected]>
> > > > > > > > > > > > wrote:
> > > > > > > > > > > > > Hi folks,
> > > > > > > > > > > > >
> > > > > > > > > > > > > While INT96 is now deprecated, it's still the default
> > > > > > timestamp
> > > > > > > > > type
> > > > > > > > > > in
> > > > > > > > > > > > > Spark, resulting in a significant amount of existing
> > > data
> > > > > > > written
> > > > > > > > > in
> > > > > > > > > > > this
> > > > > > > > > > > > > format.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Historically, parquet-mr/java has not emitted or read
> > > > > > > statistics
> > > > > > > > > for
> > > > > > > > > > > > INT96.
> > > > > > > > > > > > > This was likely due to the fact that standard byte
> > > > > comparison
> > > > > > > on
> > > > > > > > > the
> > > > > > > > > > > > INT96
> > > > > > > > > > > > > representation doesn't align with logical
> > comparisons,
> > > > > > > potentially
> > > > > > > > > > > > leading
> > > > > > > > > > > > > to incorrect min/max values. This is unfortunate
> > > because
> > > > > > > timestamp
> > > > > > > > > > > > filters
> > > > > > > > > > > > > are extremely common and lack of stats limits
> > > > optimization
> > > > > > > > > > > opportunities.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Since its inception Photon <
> > > > > > > > > > https://www.databricks.com/product/photon>
> > > > > > > > > > > > emitted
> > > > > > > > > > > > > and utilized INT96 statistics by employing a logical
> > > > > > > comparator,
> > > > > > > > > > > ensuring
> > > > > > > > > > > > > their correctness. We have now implemented
> > > > > > > > > > > > > <https://github.com/apache/parquet-java/pull/3243>
> > the
> > > > > same
> > > > > > > > > support
> > > > > > > > > > > > within
> > > > > > > > > > > > > parquet-java.
> > > > > > > > > > > > >
> > > > > > > > > > > > > We'd like to get the community's thoughts on this
> > > > addition.
> > > > > > We
> > > > > > > > > > > anticipate
> > > > > > > > > > > > > that most users may not be directly affected due to
> > the
> > > > > > > declining
> > > > > > > > > use
> > > > > > > > > > > of
> > > > > > > > > > > > > INT96. However, we are interested in identifying any
> > > > > > potential
> > > > > > > > > > > drawbacks
> > > > > > > > > > > > or
> > > > > > > > > > > > > unforeseen issues with this approach.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Cheers
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] INT96 stats

Reply via email to