tanmayrauth commented on PR #1162:
URL: https://github.com/apache/iceberg-go/pull/1162#issuecomment-4629111009
Thanks @xixipi-lining, that's a fair point and I agree on the direction.
On this PR's scope. I'd like to keep this minimal because #1155 is a
correctness regression — write.object-storage.enabled is silently ignored on
partitioned writes today, so partitioned tables end up with no entropy at all
in their data paths. The two extra fields on writerFactory restore the
documented behavior without changing any public API. I'd rather land that on
its own than gate the bug fix behind an interface change.
On the architectural direction. Agreed that long-term the partition-aware
overload is the right shape. I checked the Java reference and LocationProvider
declares it as an abstract method on the interface itself:
String newDataLocation(PartitionSpec spec, StructLike partitionData,
String filename);
with ObjectStoreLocationProvider folding the partition path into the hash
input when includePartitionPaths=true, and dropping the partition entirely when
it's false. That's actually a stronger argument than I first realized: the
factory-rebuilds-per-partition pattern in Go can't reproduce that layout no
matter what props you pass — the hash dirs end up after the partition path
instead of before, and the hash input is filename-only. So this PR restores
entropy, but it does not yet make Go match Java's spec for partitioned
object-storage layout.
That makes the follow-up worth doing properly:
- add the partition-aware method to LocationProvider
- have objectStoreLocationProvider produce Java's layout
(<storage>/<hash>/<partitionPath>/<filename> with partition folded into the
hash) and honor includePartitionPaths=false by dropping the partition
- remove tableProps and partitionLocProvider from writerFactory once the
call sites move over
That's a public-interface change and a behavior change for anyone already
on the current partitioned layout, so it deserves its own PR with a clear note
in the changelog. Happy to file the follow-up issue and pick it up after this
lands — or, if you'd prefer, I can draft the interface signature first so we
can align before either of us writes code.
WDYT — land this as the regression fix and tackle the interface evolution
separately?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]