Hi Team,

Thanks Dyno for bringing this up on the dev list!

For the others, the original goal is that if we have two transformations
where *T1.satisfiesOrderOf(T2)*, then given a partition value P1 for T1, we
should be able to derive the corresponding partition value P2 for T2 (for
example, the day 2025-10-18 exactly determines the month 2025-10). One
possible approach is the API Dyno proposed, which would be part of the
Transform interface. I’ve included your suggested Javadoc at the end of
this message for reference.

The alternative we discussed was something like:

*<P> SerializableFunction<S, T> bindByTransformedValue(Transform<?, P>
otherTransform, P otherOutput)*


This is a very low-level API, and I’d prefer to extend it only if no better
alternative exists. If you have other ideas or suggestions, we’d be happy
to hear them.

Thanks,
Peter

The javadoc for the API proposed by Dyno:


*  /***
*   * Converts a transformed partition value back to a representative
source type value.*
*   **
*   * <p>This method returns a source value that would produce the given
transformed value when this*
*   * transform is applied. For temporal transforms, this returns the start
of the period (e.g.,*
*   * start of hour, day, month, or year). For truncate transforms, this
returns the truncated value*
*   * as-is since it preserves the source type.*
*   **
*   * <p>This is useful for chaining transforms when {@link
#satisfiesOrderOf(Transform)} is true,*
*   * allowing conversion from a finer granularity to a coarser one by
converting back to source type*
*   * and reapplying the coarser transform.*
*   **
*   * @param sourceType the source type for this transform*
*   * @param transformedValue the transformed partition value*
*   * @return a source value that would produce this transformed value, or
null if the input is null*
*   * @throws UnsupportedOperationException if this transform does not
support conversion back to*
*   *     source type*
*   */*
default S toSourceTypeValue(Type sourceType, T transformedValue) {



Dyno Fu <[email protected]> ezt írta (időpont: 2025. dec. 15., H, 20:53):

> Hello Iceberg devs,
>
> I’d like to reopen the discussion on
> https://github.com/apache/iceberg/pull/14281 (“Core: Group binpack
> fileGroup by output partitionSpec”) that was marked as stable last week.
>
> This patch introduces an enhancement to the rewrite_data_files action:
> instead of grouping files by the current table partition spec, it groups
> them by the output partition spec provided in the rewrite parameters. This
> behavior enables more efficient bin-packing of small files when rolling
> data up into a coarser or alternate partition layout.
>
> the current concern for the implementation is the introduce of the the new
> api
>
> default S toSourceTypeValue(Type sourceType, T transformedValue)
>
> which is used to normalize the partition value back to the source type.
> for example an hour transform value of `489118` to a timestamp `2025-10-18
> 22:00:00` so that a different partition transform (e.g. day transform) can
> apply to it.
>
> what's your opinion on whether this is the right abstraction or any
> alternative?
> @pvary please share your thoughts as our discussion over slack.
> appreciated. thanks.
>
> regards,
> Dyno
>
> --
> reality, with all its ambiguities, does the job just fine.
>

Reply via email to