DanielLeens commented on issue #10914:
URL: https://github.com/apache/seatunnel/issues/10914#issuecomment-4597810859

   I checked this together with the current metadata path in the codebase, and 
I agree this belongs in the design lane, not as an implementation-afterthought 
under the routing work.
   
   `MetadataTransform` and metadata-schema projection already give us a place 
to project logical metadata into physical fields, so the core question here is 
not just the field names. The more important boundary to make explicit before 
implementation is semantic:
   
   1. which fields represent stable identity versus version/content state
   2. which fields are source-observed metadata versus lifecycle-control 
metadata
   3. which parts are required for the first Knowledge Sync MVP versus later 
expansion
   
   In particular, I would strongly suggest making the `document_id` / 
`chunk_id` versus `document_hash` / `chunk_hash` split more explicit in the 
next revision. If identity and change-detection semantics are not frozen first, 
later lifecycle sinks will still have to guess whether they are routing by 
stable ownership or by content-derived values.
   
   I also agree with keeping this issue separate from PR-B0. PR-B0 is the 
engine/routing foundation; this issue should freeze the metadata contract and 
projection rules that make that routing meaningful for Knowledge Sync.
   
   A stronger next version of this proposal would explicitly state:
   - identity rules
   - version/hash rules
   - projection examples through `MetadataTransform`
   - non-goals for the first MVP
   
   That will make the follow-up source/sink work much easier to review.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to