andyjianzhou commented on PR #40447: URL: https://github.com/apache/airflow/pull/40447#issuecomment-2237090174
I would love to get your advice on the workflow for this @uranusjr What I am thinking right now is creating an interface that abstracts the specifics of each storage service. It will essentially have each storage type that we support (S3, GCS, Azure). So when our API setup uses `TaskLogReader.get_log_content`, it wouldn't worry about which storage backend it's interacting with, and it'll be generic. Is that a good idea or is there a more efficient idea? What I currently have is a separate API endpoint that obtains only from S3hook for now. Should we keep this endpoint or should I attempt to reduce the number of requests we're making, and just pass it through the response of get_logs. To simplify my question, what's a good approach on obtaining file_size in the frontend? My initial thoughts is to integrate and put `get_log_page_number` together into the `get_logs` method, which would remove the initial HEAD request. I think that it is redundant to have both `get_logs` and a request specifically to obtain log size when we can just return the `file_size` in the `get_log` response. However, I'm not too sure if that's efficient. Another issue is that there are errors right now regarding the newly added offset/limit parameter for the providers. That's something that we'll also have to look into -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
