potiuk commented on issue #46657:
URL: https://github.com/apache/airflow/issues/46657#issuecomment-2666482705

   > I'm using a pydantic model to do the parsing now in 
https://github.com/apache/airflow/pull/46827 (as that also converts time 
strings into dt objects for us) so that "should be fast as its using jiter 
under the hood"
   
   I am not too concerned about the speed - I am much more concerned about 
memory. This comment was more after the "streaming" experience of it - or more 
precisely not loading whole log in memory and being able to handle log entries 
formatted in json retrieved with HTTP-RANGE requeests (i.e. broken json because 
we do not know where next log entry starts or because the json object is not 
completely available in HTTP-RANGE), I have not looked at the details (sorry) - 
so maybe it's already handled, sorry if I am adding noise, but we discussed in 
the past (also connected with https://github.com/apache/airflow/issues/45079) 
that we wanted to do partial retrieval of the logs. I am **not sure** if that's 
already in, but I think by default Pydantic Models won't do partial rendering 
and whether we already do partial log retrieval but as far as I know there is a 
special handling needed to make partial rendering works with Pydantic models.
   
   But again - maybe it's already in, or planned.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to