xinbinhuang commented on PR #20289:
URL: https://github.com/apache/flink/pull/20289#issuecomment-1272048838

   @tweise was distracted by other works. Let me get back to this.
   > I think we need to zoom in why or why not the enumerator knows the actual 
stop position without involvement of the reader.
   
   Our use case is to expose end offset or timestamp based on the content of 
the file. We're archiving out-of-retention messages into S3 using a 
long-running job. Normally there are multiple messages inside the files, and 
the timestamp of the last message may not align with the file metadata. So 
we'll need to actually parse the file content to find out either the last 
timestamp or offset. That's why I think sending back the split would make 
sense, since it's already processed there. 
   
   Do you have any recommendations around this? Or do you think this's too 
complex to implement?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to