Re: [DISCUSS] INT96 stats

Alkis Evlogimenos Mon, 30 Jun 2025 08:55:04 -0700

I also checked internally with the Spark OSS team and the plan for having
INT64 timestamps in Spark by default is to make the change when Delta v5
and Iceberg v4 are proposed. This is expected to happen around the first
half of 2026.


On Wed, Jun 25, 2025 at 8:41 PM Andrew Lamb <andrewlam...@gmail.com> wrote:

> We had a good discussion about this at the sync today.  Here is my summary
>
> * Pedantically, according to the current spec[1] there is no defined
> ordering for Int96 types and thus arrow-rs can not be writing "incorrect"
> values (as there is no definition of correct)
> * Practically speaking, arrow-rs is writing something different than Photon
> (Databricks proprietary spark engine)
> * What Photon is doing arguably makes more sense (to use the ordering of
> the only logical type to use Int96)
> * GH-7686: [Parquet] Fix int96 min/max stats #7687[2] brings arrow-rs into
> line with Photon which makes sense to me
>
> Rahul has also filed a ticket in parquet-format to discuss formalizing the
> ordering of Int96 statistics[3]
>
> In the interim, I filed a PR[4] in the parquet-format repo to at least try
> and clarify the intent of the changes to arrow-rs and parquet-java
>
> Thanks,
> Andrew
>
>
> [1]:
>
> https://github.com/apache/parquet-format/blob/cf943c197f4fad826b14ba0c40eb0ffdab585285/src/main/thrift/parquet.thrift#L1079
> [2]: https://github.com/apache/arrow-rs/pull/7687
> [3]: https://github.com/apache/parquet-format/issues/502
> [4]: https://github.com/apache/parquet-format/pull/504
>
>
> On Wed, Jun 25, 2025 at 10:52 AM Rahul Sharma
> <rahul.sha...@databricks.com.invalid> wrote:
>
> > I have prepared a doc
> > <
> >
> https://docs.google.com/document/d/1Ox0qHYBgs_3-pNqn9V8zVQm_W6qP0lsbd2XwQnQVz1Y/edit?tab=t.0
> > >
> > to summarize and have all the relevant links in one place.
> >
> > On Wed, Jun 25, 2025 at 1:32 PM Alkis Evlogimenos
> > <alkis.evlogime...@databricks.com.invalid> wrote:
> >
> > > Spark needs to start writing INT64 nanos first to be able to replace
> > INT96
> > > which is in nanos if data is at nano granularity. This is why I linked
> > that
> > > ticket which is a prerequisite to switching to INT64 in many cases.
> > >
> > > I understand the concerns around changing a deprecated aspect of the
> > > parquet spec. The reason we decided to bring this forward is because:
> > > 1. there are a lot of parquet files with the right INT96 stats outthere
> > > (Photon has been writing them for years)
> > > 2. all engines ignore the INT96 stats so Photon writing them didn't
> break
> > > anyone
> > > 3. Spark is (slowly) moving away from INT96
> > > 4. our change is very narrow, backwards compatible and can improve
> > current
> > > workloads while (3) is ongoing
> > >
> > > Let's discuss more at the sync tonight.
> > >
> > > > If we are going to standardize an ordering for INT96, rather than
> > parsing
> > > "created_by" fields, wouldn't it make more sense to add a new
> ColumnOrder
> > > value (like what's proposed for PARQUET-2249 [1])? Then we don't need
> to
> > > maintain a list of known good writers.
> > >
> > > We do not have to add another ColumnOrder value since INT96 is a
> > *physical*
> > > type and can only take timestamps in the specified format. This was
> > > arguably a design wart as it should have been a
> FIXED_LEN_BYTE_ARRAY(12)
> > > with logical type INT96_TIMESTAMP, for which a different ColumnOrder
> > would
> > > make sense. In this case we are lucky this is a physical type without
> > > logical type attached because otherwise, we couldn't have made this
> > change
> > > in a backwards compatible way as easily.
> > >
> > > On Sat, Jun 21, 2025 at 12:57 AM Ed Seidl <etse...@apache.org> wrote:
> > >
> > > > If we are going to standardize an ordering for INT96, rather than
> > parsing
> > > > "created_by" fields, wouldn't it make more sense to add a new
> > ColumnOrder
> > > > value (like what's proposed for PARQUET-2249 [1])? Then we don't need
> > to
> > > > maintain a list of known good writers.
> > > >
> > > > Ed
> > > >
> > > > [1] https://github.com/apache/parquet-format/pull/221
> > > >
> > > > On 2025/06/19 10:15:13 Andrew Lamb wrote:
> > > > > > While INT96 is now deprecated, it's still the default timestamp
> > type
> > > in
> > > > > > Spark, resulting in a significant amount of existing data written
> > in
> > > > this
> > > > > > format.
> > > > >
> > > > > I agree with Gang and Antoine that the better solution is to change
> > > Spark
> > > > > to write non deprecated parquet data types.
> > > > >
> > > > > It seems there is an issue in the Spark JIRA to do this[1] but the
> > only
> > > > > feedback on the associated PR [2] is that it is a breaking change.
> > > > >
> > > > > If Spark is going to keep writing INT96 timestamps indefinitely, I
> > > > suggest
> > > > > we un-deprecate the INT96 timestamps to reflect the ecosystem
> reality
> > > > that
> > > > > they will be here for a while rather than pretending they are
> really
> > > > > deprecated.
> > > > >
> > > > > Andrew
> > > > >
> > > > > [1]: https://issues.apache.org/jira/browse/SPARK-51359
> > > > > [2]:
> > > https://github.com/apache/spark/pull/50215#issuecomment-2715147840
> > > > >
> > > > > p.s. as an aside, is anyone from DataBricks pushing spark to change
> > > > > timestamp type? Or will the focus be to  improve INT96 timestamps
> > > > instead?
> > > > >
> > > > >
> > > > > On Wed, Jun 18, 2025 at 10:50 PM Gang Wu <ust...@gmail.com> wrote:
> > > > >
> > > > > > It seems not adding too much value to improve a deprecated
> feature
> > > > > > especially
> > > > > > when there are abundant Parquet implementations in the wild.
> IIRC,
> > > > > > parquet-java
> > > > > > is planning to release 1.16.0 for new data types like variant and
> > > > geometry.
> > > > > > It is
> > > > > > also the last version to support Java 8. All deprecated APIs
> might
> > > get
> > > > > > removed
> > > > > > from 2.0.0 so I'm not sure if older Spark versions are able to
> > > > leverage the
> > > > > > int96
> > > > > > stats. The right way to go is to push forward the adoption of
> > > timestamp
> > > > > > logical
> > > > > > types.
> > > > > >
> > > > > > Best,
> > > > > > Gang
> > > > > >
> > > > > > On Thu, Jun 19, 2025 at 12:31 AM Micah Kornfield <
> > > > emkornfi...@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > > > Hi Alkis,
> > > > > > > Is this the right thread link?  It seems to be a discussion on
> > > > Timestamp
> > > > > > > Nano support (which IIUC won't use int96, but I'm not sure this
> > > > covers
> > > > > > > changing the behavior for existing timestamps, which I think
> are
> > at
> > > > > > either
> > > > > > > millisecond or microsecond granularity)?
> > > > > > >
> > > > > > > there will be customers that want to interface with legacy
> > systems
> > > > > > > > with INT96. This is why we decided in doing both.
> > > > > > >
> > > > > > >
> > > > > > > It might help to elaborate on the time-frame here.  Since it
> > > appears
> > > > > > > reference implementations of parquet are not currently writing
> > > > > > statistics,
> > > > > > > if we merge these changes when they will be picked up in Spark?
> > > > Would the
> > > > > > > plan be to backport the parquet-java to older version of Spark
> > > > (otherwise
> > > > > > > the legacy systems wouldn't really make use or emit stats
> > anyways)?
> > > > What
> > > > > > > is the delta between Spark picking up these changes and
> > > > transitioning off
> > > > > > > of Int96 by default?   Is the expectation that even once the
> > > default
> > > > is
> > > > > > > changed in spark to not use int96, there will be a large number
> > of
> > > > users
> > > > > > > that will override the default to write int96?
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Micah
> > > > > > >
> > > > > > > On Wed, Jun 18, 2025 at 1:35 AM Alkis Evlogimenos
> > > > > > > <alkis.evlogime...@databricks.com.invalid> wrote:
> > > > > > >
> > > > > > > > We are also driving that in parallel:
> > > > > > > >
> > https://lists.apache.org/thread/y2vzrjl1499j5dvbpg3m81jxdhf4b6of
> > > .
> > > > > > > >
> > > > > > > > Even when Spark defaults to INT64 there will be old versions
> of
> > > > Spark
> > > > > > > > running, there will be customers that want to interface with
> > > legacy
> > > > > > > systems
> > > > > > > > with INT96. This is why we decided in doing both.
> > > > > > > >
> > > > > > > > On Wed, Jun 18, 2025 at 9:53 AM Antoine Pitrou <
> > > anto...@python.org
> > > > >
> > > > > > > wrote:
> > > > > > > >
> > > > > > > > >
> > > > > > > > > Can we get Spark to stop emitting INT96? They are not being
> > an
> > > > > > > > > extremely good community player here.
> > > > > > > > >
> > > > > > > > > Regards
> > > > > > > > >
> > > > > > > > > Antoine.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > On Fri, 13 Jun 2025 15:17:51 +0200
> > > > > > > > > Alkis Evlogimenos
> > > > > > > > > <alkis.evlogime...@databricks.com.INVALID>
> > > > > > > > > wrote:
> > > > > > > > > > Hi folks,
> > > > > > > > > >
> > > > > > > > > > While INT96 is now deprecated, it's still the default
> > > timestamp
> > > > > > type
> > > > > > > in
> > > > > > > > > > Spark, resulting in a significant amount of existing data
> > > > written
> > > > > > in
> > > > > > > > this
> > > > > > > > > > format.
> > > > > > > > > >
> > > > > > > > > > Historically, parquet-mr/java has not emitted or read
> > > > statistics
> > > > > > for
> > > > > > > > > INT96.
> > > > > > > > > > This was likely due to the fact that standard byte
> > comparison
> > > > on
> > > > > > the
> > > > > > > > > INT96
> > > > > > > > > > representation doesn't align with logical comparisons,
> > > > potentially
> > > > > > > > > leading
> > > > > > > > > > to incorrect min/max values. This is unfortunate because
> > > > timestamp
> > > > > > > > > filters
> > > > > > > > > > are extremely common and lack of stats limits
> optimization
> > > > > > > > opportunities.
> > > > > > > > > >
> > > > > > > > > > Since its inception Photon <
> > > > > > > https://www.databricks.com/product/photon>
> > > > > > > > > emitted
> > > > > > > > > > and utilized INT96 statistics by employing a logical
> > > > comparator,
> > > > > > > > ensuring
> > > > > > > > > > their correctness. We have now implemented
> > > > > > > > > > <https://github.com/apache/parquet-java/pull/3243> the
> > same
> > > > > > support
> > > > > > > > > within
> > > > > > > > > > parquet-java.
> > > > > > > > > >
> > > > > > > > > > We'd like to get the community's thoughts on this
> addition.
> > > We
> > > > > > > > anticipate
> > > > > > > > > > that most users may not be directly affected due to the
> > > > declining
> > > > > > use
> > > > > > > > of
> > > > > > > > > > INT96. However, we are interested in identifying any
> > > potential
> > > > > > > > drawbacks
> > > > > > > > > or
> > > > > > > > > > unforeseen issues with this approach.
> > > > > > > > > >
> > > > > > > > > > Cheers
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] INT96 stats

Reply via email to