DanielLeens commented on issue #10914: URL: https://github.com/apache/seatunnel/issues/10914#issuecomment-4597810859
I checked this together with the current metadata path in the codebase, and I agree this belongs in the design lane, not as an implementation-afterthought under the routing work. `MetadataTransform` and metadata-schema projection already give us a place to project logical metadata into physical fields, so the core question here is not just the field names. The more important boundary to make explicit before implementation is semantic: 1. which fields represent stable identity versus version/content state 2. which fields are source-observed metadata versus lifecycle-control metadata 3. which parts are required for the first Knowledge Sync MVP versus later expansion In particular, I would strongly suggest making the `document_id` / `chunk_id` versus `document_hash` / `chunk_hash` split more explicit in the next revision. If identity and change-detection semantics are not frozen first, later lifecycle sinks will still have to guess whether they are routing by stable ownership or by content-derived values. I also agree with keeping this issue separate from PR-B0. PR-B0 is the engine/routing foundation; this issue should freeze the metadata contract and projection rules that make that routing meaningful for Knowledge Sync. A stronger next version of this proposal would explicitly state: - identity rules - version/hash rules - projection examples through `MetadataTransform` - non-goals for the first MVP That will make the follow-up source/sink work much easier to review. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
