rcwoolston opened a new issue, #24687:
URL: https://github.com/apache/airflow/issues/24687

   ### Apache Airflow version
   
   2.3.2 (latest released)
   
   ### What happened
   
   Scheduler is getting a PermissionError when running the file process error 
with the following stack trace
   
   
   Process DagFileProcessor41434-Process:
   Traceback (most recent call last):
   File "/opt/conda_envs/airflow/lib/python3.8/multiprocessing/process.py", 
line 315, in _bootstrap
   self.run()
   File "/opt/conda_envs/airflow/lib/python3.8/multiprocessing/process.py", 
line 108, in run
   self._target(*self._args, **self._kwargs)
   File 
"/opt/conda_envs/airflow/lib/python3.8/site-packages/airflow/dag_processing/processor.py",
 line 155, in _run_file_processor
   result: Tuple[int, int] = dag_file_processor.process_file(
   File 
"/opt/conda_envs/airflow/lib/python3.8/site-packages/airflow/utils/session.py", 
line 71, in wrapper
   return func(*args, session=session, **kwargs)
   File 
"/opt/conda_envs/airflow/lib/python3.8/site-packages/airflow/dag_processing/processor.py",
 line 660, in process_file
   dagbag.sync_to_db()
   File 
"/opt/conda_envs/airflow/lib/python3.8/site-packages/airflow/utils/session.py", 
line 71, in wrapper
   return func(*args, session=session, **kwargs)
   File 
"/opt/conda_envs/airflow/lib/python3.8/site-packages/airflow/models/dagbag.py", 
line 615, in sync_to_db
   for attempt in run_with_db_retries(logger=self.log):
   File 
"/opt/conda_envs/airflow/lib/python3.8/site-packages/tenacity/__init__.py", 
line 382, in __iter__
   do = self.iter(retry_state=retry_state)
   File 
"/opt/conda_envs/airflow/lib/python3.8/site-packages/tenacity/__init__.py", 
line 349, in iter
   return fut.result()
   File "/opt/conda_envs/airflow/lib/python3.8/concurrent/futures/_base.py", 
line 437, in result
   return self.__get_result()
   File "/opt/conda_envs/airflow/lib/python3.8/concurrent/futures/_base.py", 
line 389, in __get_result
   raise self._exception
   File 
"/opt/conda_envs/airflow/lib/python3.8/site-packages/airflow/models/dagbag.py", 
line 629, in sync_to_db
   DAG.bulk_write_to_db(self.dags.values(), session=session)
   File 
"/opt/conda_envs/airflow/lib/python3.8/site-packages/airflow/utils/session.py", 
line 68, in wrapper
   return func(*args, **kwargs)
   File 
"/opt/conda_envs/airflow/lib/python3.8/site-packages/airflow/models/dag.py", 
line 2470, in bulk_write_to_db
   DagCode.bulk_sync_to_db(filelocs, session=session)
   File 
"/opt/conda_envs/airflow/lib/python3.8/site-packages/airflow/utils/session.py", 
line 68, in wrapper
   return func(*args, **kwargs)
   File 
"/opt/conda_envs/airflow/lib/python3.8/site-packages/airflow/models/dagcode.py",
 line 114, in bulk_sync_to_db
   os.path.getmtime(correct_maybe_zipped(fileloc)), tz=timezone.utc
   File "/opt/conda_envs/airflow/lib/python3.8/genericpath.py", line 55, in 
getmtime
   return os.stat(filename).st_mtime
   PermissionError: [Errno 13] Permission denied: 
'/opt/airflow/dags/airflow_dags/<redacted>/<redacted>.py'
   {manager.py:924} ERROR - Processor for 
/opt/airflow/dags/airflow_dags/<redacted>/<redacted>.py exited with return code 
1.
   
   
   I performed a couple courses of action:
   - Confirmed that no deployments occurred that may have clobbered permissions 
for whatever error
   - Setup a CRON job to force the permissions incase I missed something
   - Changed the value file_parsing_sort_mode to random_seeded_by_host  and 
alphabetical in an attempted to see if it could be bypassed by not looking at 
the modified date.
   
   It occurs randomly and not every schedule loop, or even on a predictable 
loop. I almost wonder if a race condition is causing the issue within the 
scheduler. This happened after our update from 2.2.3 to 2.3.2.
   
   
   
   ### What you think should happen instead
   
   Not error out.
   
   ### How to reproduce
   
   Unable to consistently reproduce it.
   
   ### Operating System
   
   REHL
   
   ### Versions of Apache Airflow Providers
   
   apache-airflow-providers-amazon==2.4.0
   apache-airflow-providers-apache-hdfs==2.2.0
   apache-airflow-providers-apache-hive==2.1.0
   apache-airflow-providers-apache-spark==2.0.2
   apache-airflow-providers-apache-sqoop==2.0.2
   apache-airflow-providers-celery==2.1.0
   apache-airflow-providers-ftp @ 
file:///home/conda/feedstock_root/build_artifacts/apache-airflow-providers-ftp_1631176991628/work
   apache-airflow-providers-http @ 
file:///home/conda/feedstock_root/build_artifacts/apache-airflow-providers-http_1630909395407/work
   apache-airflow-providers-imap @ 
file:///home/conda/feedstock_root/build_artifacts/apache-airflow-providers-imap_1631176968327/work
   apache-airflow-providers-jenkins==2.0.3
   apache-airflow-providers-jira==2.0.1
   apache-airflow-providers-microsoft-azure==3.4.0
   apache-airflow-providers-microsoft-mssql==2.0.1
   apache-airflow-providers-mysql==2.1.1
   apache-airflow-providers-odbc==2.0.1
   apache-airflow-providers-oracle==2.0.1
   apache-airflow-providers-papermill==2.1.0
   apache-airflow-providers-postgres==2.4.0
   apache-airflow-providers-samba==3.0.1
   apache-airflow-providers-sqlite @ 
file:///home/conda/feedstock_root/build_artifacts/apache-airflow-providers-sqlite_1631202652057/work
   apache-airflow-providers-ssh==2.3.0
   apache-airflow-providers-tableau==2.1.2
   
   
   ### Deployment
   
   Virtualenv installation
   
   ### Deployment details
   
   _No response_
   
   ### Anything else
   
   _No response_
   
   ### Are you willing to submit PR?
   
   - [ ] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of 
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to