PatrykKlimowicz opened a new issue, #25177:
URL: https://github.com/apache/airflow/issues/25177

   ### Apache Airflow version
   
   2.3.3 (latest released)
   
   ### What happened
   
   Durign usage of Airflow v2.1.3 in my project 
[this](https://github.com/apache/airflow/issues/17512) issue appeared, and was 
solved by adding the `Offset_Key` to the [Fluent 
Bit](https://github.com/fluent/fluent-bit) configuration. This Offset_Key 
appends the offset field to the logs, so we can retrieve the logs in correct 
order. We specified the `AIRFLOW__ELASTICSEARCH__OFFSET_FIELD="custom_offset"` 
and logs were retrieved correctly based on the `custom_offset` and then 
displayed in Airflow UI.
   
   Now, I updated the version to the v2.3.3 and this behavior is no longer 
valid. I tested some combinations:
   
   - AIRFLOW__ELASTICSEARCH__OFFSET_FIELD and Offset_Key has the same value - 
no offset key is created in the logs and logs cannot be obtained from 
ElasticSearch
   - AIRFLOW__ELASTICSEARCH__OFFSET_FIELD and Offset_Key  has different values 
- both offset keys are added to the logs and I can see the logs on UI (logs are 
obtained based on AIRFLOW__ELASTICSEARCH__OFFSET_FIELD  and not custom one).
   Due to backward compatibility I need to achieve config in which 
`custom_offset` has higher precedence than the one Airflow inserts.
   
   As suggested [here](https://github.com/apache/airflow/discussions/25154) I 
tried to lower the elasticsearch provider version and see which one will work 
for this scenario.
   
   It turned out that the version which we used with Airflow v2.1.3 was OK, so 
the `apache-airflow-providers-elasticsearch==2.0.2`.
   I think that [this](https://github.com/apache/airflow/pull/17551) change 
break our use case, as the version `2.0.3` is first that does not work for us - 
[changelog](https://pypi.org/project/apache-airflow-providers-elasticsearch/2.0.3/).
 With the version 2.0.2 I can see that `custom_offset` and the Airflow's 
`offset` are added to the logs, but thanks to 
`AIRFLOW__ELASTICSEARCH__OFFSET_FIELD="custom_offset"` logs are displayed in 
correct order.
   
   
   ### What you think should happen instead
   
   Offset from Airflow should not conflict with the offset added by third party 
tool since Airflow does not support sending logs to the ElasticSearch, but 
supports reading from it. 
   
   Most probably, there will be an issue with flow of the logs. Right now it is 
like:
   
   Airflow -> LogFile <- Fluent Bit -> ElasticSearch <- Airflow 
   
   so Airflow does not know about the (in that specific case) Fluent Bit config 
and it's offset name. 
   
   It would be nice to make the change in version 2.0.3 I linked above 
optional, so we can instruct Airflow if it should create a offset with given 
`AIRFLOW__ELASTICSEARCH__OFFSET_FIELD` name or just use that name to obtain 
logs (I do not know the whole logic behind the Airflow logs retrieval, so not 
sure if this is a good idea). I think that the bool flag like 
`AIRFLOW__ELASTICSEARCH__ADD_OFFSET_FIELD` could determine the creation of 
Airflow's offset field and the `AIRFLOW__ELASTICSEARCH__OFFSET_FIELD` could 
determine what name to use to either create and retrieve logs OR just retrieve 
the logs. 
   
   ### How to reproduce
   
   Use Airflow in v2.3.3.
   Use [Fluent 
Bit](https://github.com/fluent/helm-charts/tree/main/charts/fluent-bit) in 
v1.9.6 and add the Offset_Key to it's 
[INPUT](https://github.com/fluent/helm-charts/blob/main/charts/fluent-bit/values.yaml#L292)
 config
   Use ElasticSearch to store logs and read logs from ElasticSearch in Airflow 
UI.
   
   ### Operating System
   
   AKS
   
   ### Versions of Apache Airflow Providers
   
   Working case (Airflow 2.1.3):
   
   - apache-airflow-providers-amazon==2.1.0
   - apache-airflow-providers-celery==2.0.0
   - apache-airflow-providers-cncf-kubernetes==2.0.2
   - apache-airflow-providers-docker==2.1.0
   - apache-airflow-providers-elasticsearch==2.0.2
   - apache-airflow-providers-ftp==2.0.0
   - apache-airflow-providers-google==5.0.0
   - apache-airflow-providers-grpc==2.0.0
   - apache-airflow-providers-hashicorp==2.0.0
   - apache-airflow-providers-http==2.0.0
   - apache-airflow-providers-imap==2.0.0
   - apache-airflow-providers-microsoft-azure==3.1.0
   - apache-airflow-providers-mysql==2.1.0
   - apache-airflow-providers-odbc==2.0.0
   - apache-airflow-providers-postgres==2.0.0
   - apache-airflow-providers-redis==2.0.0
   - apache-airflow-providers-sendgrid==2.0.0
   - apache-airflow-providers-sftp==2.1.0
   - apache-airflow-providers-slack==4.0.0
   - apache-airflow-providers-sqlite==2.0.0
   - apache-airflow-providers-ssh==2.1.0
   
   Not working case (Airflow v2.3.3):
   apache-airflow-providers-amazon==4.0.0
   apache-airflow-providers-celery==3.0.0
   apache-airflow-providers-cncf-kubernetes==4.1.0
   apache-airflow-providers-docker==3.0.0
   apache-airflow-providers-elasticsearch==4.0.0
   apache-airflow-providers-ftp==3.0.0
   apache-airflow-providers-google==8.1.0
   apache-airflow-providers-grpc==3.0.0
   apache-airflow-providers-hashicorp==3.0.0
   apache-airflow-providers-http==3.0.0
   apache-airflow-providers-imap==3.0.0
   apache-airflow-providers-microsoft-azure==4.0.0
   apache-airflow-providers-mysql==3.0.0
   apache-airflow-providers-odbc==3.0.0
   apache-airflow-providers-postgres==5.0.0
   apache-airflow-providers-redis==3.0.0
   apache-airflow-providers-sendgrid==3.0.0
   apache-airflow-providers-sftp==3.0.0
   apache-airflow-providers-slack==5.0.0
   apache-airflow-providers-sqlite==3.0.0
   apache-airflow-providers-ssh==3.0.0
   
   Airflow v2.3.3 is working with apache-airflow-providers-elasticsearch==2.0.2
   
   ### Deployment
   
   Other 3rd-party Helm chart
   
   ### Deployment details
   
   We are using Airflow Community Helm chart + Azure Kubernetes Service
   
   ### Anything else
   
   _No response_
   
   ### Are you willing to submit PR?
   
   - [ ] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of 
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to