leaves12138 commented on code in PR #7832:
URL: https://github.com/apache/paimon/pull/7832#discussion_r3256662080
##########
paimon-core/src/main/java/org/apache/paimon/schema/SchemaValidation.java:
##########
@@ -578,6 +582,19 @@ private static void validateFileIndex(TableSchema schema) {
}
}
+ private static void validateSnapshotSequenceOrdering(TableSchema schema,
CoreOptions options) {
+ checkArgument(
+ !schema.primaryKeys().isEmpty(),
+ "%s = true requires a primary-key table; append-only tables
cannot use "
+ + "snapshot-based sequence ordering.",
+ CoreOptions.SEQUENCE_SNAPSHOT_ORDERING.key());
+ checkArgument(
+ options.sequenceField().isEmpty(),
+ "%s = true is mutually exclusive with %s; the snapshot id is
the sole tiebreaker.",
+ CoreOptions.SEQUENCE_SNAPSHOT_ORDERING.key(),
+ CoreOptions.SEQUENCE_FIELD.key());
+ }
Review Comment:
This option is currently accepted for every primary-key merge engine, but
the implementation only preserves `snapshotId` for merge functions that return
an input `KeyValue`. For example, `PartialUpdateMergeFunction` and
`AggregateMergeFunction` build a new `KeyValue` via `replace(...)`, which
resets `snapshotId` to `UNKNOWN_SNAPSHOT_ID`. During compaction,
`stampSequenceWithSnapshotId` then writes `-1` into `_SEQUENCE_NUMBER` / file
sequence metadata, so later reads can order compacted records incorrectly.
Could you either restrict `sequence.snapshot-ordering` to the supported merge
engine(s) here, or propagate the winning snapshot id through all merge
functions and add tests for partial-update / aggregation?
##########
paimon-api/src/main/java/org/apache/paimon/CoreOptions.java:
##########
@@ -965,6 +965,20 @@ public InlineElement getDescription() {
.defaultValue(SortOrder.ASCENDING)
.withDescription("Specify the order of sequence.field.");
+ @Immutable
+ public static final ConfigOption<Boolean> SEQUENCE_SNAPSHOT_ORDERING =
+ key("sequence.snapshot-ordering")
+ .booleanType()
+ .defaultValue(false)
+ .withDescription(
+ "When enabled, merge uses the commit snapshot id
as the primary "
Review Comment:
This option also looks unsafe to enable on a table that already has data
written without the feature. Existing APPEND files have `minSequenceNumber` as
the old sequence range, and existing COMPACT files have `_SEQUENCE_NUMBER` as
the old per-record sequence number; after toggling this option on, readers will
interpret those values as snapshot ids. Could this be documented and/or
rejected for `ALTER TABLE` as a creation-only option? Otherwise an existing
table can silently reorder old records.
##########
paimon-core/src/main/java/org/apache/paimon/mergetree/compact/LookupMergeTreeCompactRewriter.java:
##########
@@ -217,7 +221,8 @@ public MergeFunctionWrapper<ChangelogResult> create(
valueEqualiser,
lookupStrategy,
deletionVectorsMaintainer,
- userDefinedSeqComparator);
+ userDefinedSeqComparator,
+ snapshotSequenceOrdering);
Review Comment:
The lookup changelog path can still lose the snapshot id when
`LookupMergeFunction` spills its `KeyValueBuffer` to the binary buffer.
`KeyValueBuffer.createBinaryBuffer` still constructs `new
KeyValueWithLevelNoReusingSerializer(keyType, valueType)` without
`includeSnapshotId`, so after `lookup.merge-records-threshold` is exceeded,
deserialized candidates have `UNKNOWN_SNAPSHOT_ID` and this comparator falls
back to sequence-only ordering. Please thread `snapshotSequenceOrdering` into
`KeyValueBuffer`'s serializer and add a test that forces lookup-buffer spill,
for example with a very small `lookup.merge-records-threshold` and an
`IOManager`.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]