potiuk commented on issue #46657: URL: https://github.com/apache/airflow/issues/46657#issuecomment-2666482705
> I'm using a pydantic model to do the parsing now in https://github.com/apache/airflow/pull/46827 (as that also converts time strings into dt objects for us) so that "should be fast as its using jiter under the hood" I am not too concerned about the speed - I am much more concerned about memory. This comment was more after the "streaming" experience of it - or more precisely not loading whole log in memory and being able to handle log entries formatted in json retrieved with HTTP-RANGE requeests (i.e. broken json because we do not know where next log entry starts or because the json object is not completely available in HTTP-RANGE), I have not looked at the details (sorry) - so maybe it's already handled, sorry if I am adding noise, but we discussed in the past (also connected with https://github.com/apache/airflow/issues/45079) that we wanted to do partial retrieval of the logs. I am **not sure** if that's already in, but I think by default Pydantic Models won't do partial rendering and whether we already do partial log retrieval but as far as I know there is a special handling needed to make partial rendering works with Pydantic models. But again - maybe it's already in, or planned. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
