andyjianzhou commented on PR #40447:
URL: https://github.com/apache/airflow/pull/40447#issuecomment-2237119731

   > I would love to get your advice on the workflow for this @uranusjr
   > 
   > What I am thinking right now is creating an interface that abstracts the 
specifics of each storage service. It will essentially have each storage type 
that we support (S3, GCS, Azure). So when our API setup uses 
`TaskLogReader.get_log_content`, it wouldn't worry about which storage backend 
it's interacting with, and it'll be generic. Is that a good idea or is there a 
more efficient idea? What I currently have is a separate API endpoint that 
obtains only from S3hook for now. Should we keep this endpoint or should I 
attempt to reduce the number of requests we're making, and just pass it through 
the response of `get_logs`. To simplify my question, what's a good approach on 
obtaining file_size in the frontend?
   > 
   > My initial thoughts is to integrate and put `get_log_page_number` together 
into the `get_logs` method, which would remove the initial HEAD request. I 
think that it is redundant to have both `get_logs` and a request specifically 
to obtain log size when we can just return the `file_size` in the `get_log` 
response. However, I'm not too sure if that's efficient.
   > 
   > Another issue is that there are errors right now regarding the newly added 
offset/limit parameter for the providers. That's something that we'll also have 
to look into
   
   I'll first fix the other tests that are breaking, aka the providers (after I 
added offset/limit functionality)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to