mik-laj edited a comment on issue #5177: [AIRFLOW-4084] Fix bug downloading 
incomplete logs from ElasticSearch
URL: https://github.com/apache/airflow/pull/5177#issuecomment-557788417
 
 
   I wonder why this change had to make changes to the ``views.py`` file.  Why 
were the logs not updated then?  In my opinion, we should check the existence 
of the ``download_logs`` key in the implementation of the handler logic and 
then disable the pagination mechanism. Now the abstraction of code is running 
away and it is possible that we are breaking other handlers because they 
expected different behavior.
   
   I am working on documentation for this class and if we withdraw this change 
we will be able to do it as follows.  In my opinion this is the original 
behavior of this code.
   ```
   class TaskHandler(logging.Handler, ABC):
       """
       Handler that allows you to write and read information about a specific 
task.
       """
       @abstractmethod
       def read(
           self, task_instance: TaskInstance, try_number: Optional[int] = None, 
metadata: Optional[Dict] = None
       ) -> Tuple[List[str], List[Dict]]:
           """
           Read logs of given task instance.
   
           It supports log pagination. To do this, the first call to this 
function contains an empty metadata
           object. As a result, list of logs and list of metadata should be 
returned. The resulting
           metadata should contain the key ``end_of_logs``, which determines 
whether pagination should be
           continued. It is possible to return more metadata objects, but only 
the first is used, so
           you should always return a list with one item.
   
           The remaining keys in the dictionary are sent back to the method 
without changes, which means that
           if you add an additional key with a token or page number, you can 
expect that the key will be
           available in the next request for logs.
   
           If the metadata in the call contains the ``download_logs'' key, then 
full logs should be
           returned without pagination.
   
           :param task_instance: task instance object
           :param try_number: task instance try_number to read logs from. If 
None
              it returns all logs separated by try_number
           :param metadata: log metadata,
               can be used for steaming log reading and auto-tailing.
           :return: a list of logs and list of metadata objects.
           """
           ...
   
       @abstractmethod
       def set_context(self, task_instance: TaskInstance) -> None:
           """
           Provide task_instance context to airflow task handler.
   
           Different implementations provide different behavior. Examples of 
behavior are:
   
           * in the case of handlers writing to a file, it may start writing to 
another file;
           * for remote services, it can start adding labels to logs.
   
           This allows us to later search for logs for a single task
   
           :param task_instance: task instance object
           """
           ...
   ```
   
   I would like to point out that in this PR there is a duplicate mechanism of 
handling the case when try_numbers is empty, i.e. the log for all try numbers 
is downloaded.
   
   Previously it was part of the file_task_handler handler, and now it has been 
copied a second time to another place despite the es_task_handler extended from 
this handler.
   
https://github.com/apache/airflow/blob/master/airflow/utils/log/file_task_handler.py#L155-L164
   
https://github.com/apache/airflow/blob/master/airflow/utils/log/es_task_handler.py#L36
   https://github.com/apache/airflow/blob/master/airflow/www/views.py#L595-L599

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to