RC5 coming shortly. -ash
> On 1 Aug 2019, at 13:07, Ash Berlin-Taylor <[email protected]> wrote: > > Oh it wasn't a bad cherry-pick but a "merge fight" - you added a call to > _manager.shutdown but I removed manager. The problem is on master too > https://github.com/apache/airflow/blob/bfff185edbc1b27a635d613beecf80fe1dbd3758/airflow/jobs/scheduler_job.py#L188 > > <https://github.com/apache/airflow/blob/bfff185edbc1b27a635d613beecf80fe1dbd3758/airflow/jobs/scheduler_job.py#L188> > so the fix is easy enough. > > That said: Dan, would you be able to add a unit test that checks that > `kill()` path better? > > -ash > >> On 1 Aug 2019, at 13:03, Dan Davydov <[email protected]> wrote: >> >> Haven't taken a look at the bad cherry pick, but if it's my fault LMK will >> take a look and submit a patch (I'll be out after tomorrow though). >> >> On Thu, Aug 1, 2019 at 2:53 PM Ash Berlin-Taylor <[email protected]> wrote: >> >>> We've just noticed two problems: >>> >>> 1. If a dag parser process takes too long the DagFIleProcessorManager dies >>> with an exception when trying to kill it (A bad cherry-pick between #5615 >>> and #5605) >>> >>> The logs when this happens are: >>> >>> [2019-08-01 12:48:13,524] {dag_processing.py:543} INFO - Launched >>> DagFileProcessorManager with pid: 62259 >>> [2019-08-01 12:48:13,553] {settings.py:54} INFO - Configured default >>> timezone <Timezone [UTC]> >>> [2019-08-01 12:48:13,594] {dag_processing.py:746} ERROR - Cannot use more >>> than 1 thread when using sqlite. Setting parallelism to 1 >>> Process Process-3: >>> Traceback (most recent call last): >>> File >>> "/Users/ash/.homebrew/Cellar/python/3.7.3/Frameworks/Python.framework/Versions/3.7/lib/python3.7/multiprocessing/process.py", >>> line 297, in _bootstrap >>> self.run() >>> File >>> "/Users/ash/.homebrew/Cellar/python/3.7.3/Frameworks/Python.framework/Versions/3.7/lib/python3.7/multiprocessing/process.py", >>> line 99, in run >>> self._target(*self._args, **self._kwargs) >>> File >>> "/Users/ash/.virtualenvs/foo/lib/python3.7/site-packages/airflow/utils/dag_processing.py", >>> line 611, in _run_processor_manager >>> processor_manager.start() >>> File >>> "/Users/ash/.virtualenvs/foo/lib/python3.7/site-packages/airflow/utils/dag_processing.py", >>> line 856, in start >>> simple_dags = self.collect_results() >>> File >>> "/Users/ash/.virtualenvs/foo/lib/python3.7/site-packages/airflow/utils/dag_processing.py", >>> line 1111, in collect_results >>> self._kill_timed_out_processors() >>> File >>> "/Users/ash/.virtualenvs/foo/lib/python3.7/site-packages/airflow/utils/dag_processing.py", >>> line 1231, in _kill_timed_out_processors >>> processor.kill() >>> File >>> "/Users/ash/.virtualenvs/foo/lib/python3.7/site-packages/airflow/jobs/scheduler_job.py", >>> line 194, in kill >>> self._manager.shutdown() >>> AttributeError: 'DagFileProcessor' object has no attribute '_manager' >>> [2019-08-01 12:48:51,627] {dag_processing.py:650} WARNING - >>> DagFileProcessorManager (PID=62259) exited with exit code 1 - re-launching >>> [2019-08-01 12:48:51,641] {dag_processing.py:543} INFO - Launched >>> DagFileProcessorManager with pid: 62602 >>> >>> >>> So the scheduler does "recover" and carry on, but this is easy to fix. >>> >>> >>> And 2: Google just released a new version of google-cloud-spanner that >>> because of some other broken/overly-strick pinning of øther gcp-api modules >>> won't work at runtime, and we see this when installing: >>> >>> ERROR: google-cloud-container 0.3.0 has requirement >>> grpc-google-iam-v1<0.13dev,>=0.12.3, but you'll have grpc-google-iam-v1 >>> 0.11.4 which is incompatible. >>> ERROR: google-cloud-spanner 1.10.0 has requirement >>> grpc-google-iam-v1<0.13dev,>=0.12.3, but you'll have grpc-google-iam-v1 >>> 0.11.4 which is incompatible. >>> >>> I think with both of these it's worth fixing these and making an RC5 >>> >>> -ash >>> >>>> On 31 Jul 2019, at 11:03, Ash Berlin-Taylor <[email protected]> wrote: >>>> >>>> Hi my fellow Airflow peeps, >>>> >>>> After the process leaking bug James found and the Mysql 8.0.16+ issue >>> Jarek reported (both of which have been fixed) we are now ready to try >>> again. I have just cut 1.10.4rc4. >>>> >>>> This email is calling a vote on the release, which will last for 72 >>> hours (2019-08-03 10:00 Z - Saturday noon UK4 1am PDT), and until three >>> binding votes have been cast. Consider this my (binding) +1. >>>> >>>> Airflow 1.10.4 RC4 is available at: >>>> https://dist.apache.org/repos/dist/dev/airflow/1.10.4rc4/ >>>> >>>> *apache-airflow-1.10.4rc4-source.tar.gz* is a source release that comes >>> with INSTALL instructions. >>>> *apache-airflow-1.10.4rc4-bin.tar.gz* is the binary Python "sdist" >>> release. >>>> *apache_airflow-1.10.4rc4-py2.py3-none-any.whl* is the binary Python >>> "wheel" release. >>>> >>>> For connivence of testers the RC is on PYPI too. It can be installed >>> with: >>>> >>>> pip install 'apache-airflow==1.10.4rc4' >>>> >>>> Public keys are available at: >>> https://dist.apache.org/repos/dist/release/airflow/KEYS >>>> >>>> Only votes from PMC members are binding (sorry committers), but members >>> of the community are encouraged to test the release and vote with >>> "(non-binding)". >>>> >>>> Please note that the version number excludes the `rcX` string, so it's >>> now simply 1.10.4. This will allow us to rename the artefact without >>> modifying the checksums when we actually release. >>>> >>>> [ ] +1 Release this package as Apache Airflow 1.10.4 >>>> [ ] 0 No opinion >>>> [ ] -1 Do not release this package because... >>>> >>>> >>>> Changes since RC3: >>>> >>>> - e424f1792 [AIRFLOW-XXX] Update changelog for 1.10.4rc4 >>>> - 8be59fb4e [AIRFLOW-4289] fix spark_binary argument being ignored in >>> SparkSubmitHook (#5564) >>>> - 775f5b265 [AIRFLOW-5075] Let HttpHook handle connections with empty >>> host fields (#5686) >>>> - 16ca5b4fc [AIRFLOW-5078] User is asked if an image needs to be rebuild >>> (#5691) >>>> - 9184f422d [AIRFLOW-5079] Checklicence test uses own, much smaller >>> image (#5692) >>>> - 00c5a1c39 [AIRFLOW-5077] Skip force pulling latest python in CI >>> environment (#5690) >>>> - 461c3bc5c [AIRFLOW-XXX] remove an old ci script >>>> - 222c6ac45 [AIRFLOW-4811] Implement GCP DLP' Hook and Operators (#5539) >>>> - b041d34fc [AIRFLOW-5065] Add colors to console log (#5681) >>>> - 1db79e6ee [AIRFLOW-XXX] Remove default/wrong values from test config. >>> (#5684) >>>> - c6775980a [AIRFLOW-4822] Fix bug where parent-dag task instances are >>> wrongly cleared (#5444) >>>> - c202774ce [AIRFLOW-5022] Fix DockerHook for registries with port >>> numbers (#5644) >>>> - 90331b00c [AIRFLOW-4961] Insert TaskFail.duration as int match DB >>> schema column type (#5593) >>>> - 8f3abc8bf [AIRFLOW-5038] skip pod deleted log message when pod >>> deletion is disabled (#5656) >>>> - b6f8d67bf [AIRFLOW-5067] Update pagination symbols (#5682) >>>> - f82f8998c [AIRFLOW-5035] Replace multiprocessing.Manager with a >>> golang-"channel" style (#5615) >>>> - 8368b5cd8 [AIRFLOW-4883] Fix tests on Python 3.5 (#5655) >>>> - 4f059165c [AIRFLOW-4883] Fix tests on Python 3.5 >>>> - fb62b284e [AIRFLOW-4883] Bug-fix for Kill hung file process managers >>> (#5639) >>>> - 1739d5827 [AIRFLOW-4883] Kill hung file process managers (#5605) >>>> - b6312298b [AIRFLOW-4338] Change k8s pod_request_factory to use yaml >>> safe_load (#5120) >>>> - 6adc0616a [AIRFLOW-5050] Correctly delete FAB permission m2m objects >>> in sync_perms (#5679) >>>> - 7552b09e0 [AIRFLOW-XXX] Add missing doc for annotations param of >>> KubernetesPodOperator (#5666) >>>> - e86899be4 [AIRFLOW-3370] Fix bug in Elasticsearch task log handler >>> (#5667) >>>> - 689dcf650 [AIRFLOW-5030] fix env var expansion for config key contains >>> __ (#5650) >>>> - a160c36c5 changing log level to be proper library to suppress warning >>> for https://issues.apache.org/jira/browse/AIRFLOW-4590 (#5337) >>>> - b684455d1 [AIRFLOW-5070] Fixed python3 forced in v1-10-test branch >>>> - 1702e741b Reorganize sql to gcs operators. (#5504) >>>> - 44a055be9 [AIRFLOW-4451] Allow templated named tuples (#5673) >>>> - 1f2c32dda [AIRFLOW-5064] Switched to python 3.5 (#5678) >>>> - c698ad07f [AIRFLOW-5063] Fix performance when switching between >>> master/v1-10 >>>> - 5e893b2b5 [AIRFLOW-4981][AIRFLOW-4788] Always use pendulum DateTimes >>> in task in… (#5654) >>>> - 6e29379dc [AIRFLOW-XXX] fix copy/pasta in k8s request factory extract >>> resources (#5657) >>>> - 1c5f4b99a [AIRFLOW-4880] Add success, failure and fail_on_empty params >>> to SqlSensor (#5488) >>>> - dde00ee7a [AIRFLOW-3617] Add gpu limits option in configurations for >>> executor and pod (#5643) >>>> - 520f9a8f7 [AIRFLOW-4775] Fix incorrect parameter order in GceHook >>> (#5613) >>>> - 309207256 [AIRFLOW-4998] Run multiple queries in BigQueryOperator >>> (#5619) >>>> - afd1a3d32 [AIRFLOW-5041] just force PYTHON_VERSION variable (#5660) >>>> - 1120b12eb [AIRFLOW-5021] move gitpython into setup_requires (#5640) >>>> - c41319480 [AIRFLOW-4583] Fixes type error in GKEPodOperator (#5612) >>>> - 6018cffac [AIRFLOW-4929] Improve display of JSON Variables in UI >>> (#5641) >>>> - 8fc11877a [AIRFLOW-5014] Fix sphinx doc problem and leaves API docs >>>> - 8a749bfc8 [AIRFLOW-4995] Fix DB initialisation on MySQL (#5614) >>>> - 8a1470dfd [AIRFLOW-5008] Fixed missing libmysql-client-dev in Oracle >>> repos >>>> - c3a714128 [AIRFLOW-5007] Remove override of python version to 3.6 in >>> tests (#5628) >>>> - 625c4d465 [AIRFLOW-XXX] Fix typos in CONTRIBUTING.md (#5626) >>>> - 48abd4867 [AIRFLOW-5005] Split kubernetes tests into separate jobs >>>> - 9dea545ae [AIRFLOW-5004] Branch/image for CI builds is selected via >>> TRAVIS_BRANCH >>>> - 5da71fd23 [AIRFLOW-5002] Diagnostics of getopt fixed for zsh on MacOS >>>> - e1a8f9d43 [AIRFLOW-5001] Moving building image to before_install phase >>>> - 6735ea4f4 [AIRFLOW-4999] Local build and build_and_pull work on both >>> images >>>> - d3c022f09 [AIRFLOW-4997] Support for non-master branches >>>> - 8a644bdfe [AIRFLOW-4994] Slim version of integration test environment >>>> - 0893cdc2a [AIRFLOW-4117] Travis CI uses multi-stage images to run >>> tests (#4938) >>>> - 442a126f4 [AIRFLOW-4116] Dockerfile now supports CI image build on >>> DockerHub (#4937) >>>> - 0c7ccbbd9 [AIRFLOW-4115] Multi-staging Aiflow Docker image (#4936) >>>> - aaa8c3e4f [AIRFLOW-4959] Add .hql support for the DataProcHiveOperator >>> (#5591) >>>> - ac01938c3 [AIRFLOW-4963] Avoid recreating task context (#5596) >>>> - 5f0e64aa8 [AIRFLOW-4865] Add context manager to set temporary config >>> values in tests. (#5569) >>>> - b18bf5489 [AIRFLOW-4929] Pretty print JSON Variables in UI (#5573) >>>> - e17bfc492 [AIRFLOW-4962] Fix Werkzeug v0.15 deprecation notice for >>> DispatcherMiddleware import (#5595) >>>> - ef88bfbb2 [AIRFLOW-XXX] Correctly deprecate old elasticsearch config >>> names (#5603) >>>> >>>> Full change log since 1.10.3: >>>> >>>> Airflow 1.10.4, - 2019-08-03 >>>> ---------------------------- >>>> >>>> New Features >>>> """""""""""" >>>> - [AIRFLOW-4811] Implement GCP Data Loss Prevention Hook and Operators >>> (#5539) >>>> - [AIRFLOW-5035] Replace multiprocessing.Manager with a golang-"channel" >>> style (#5615) >>>> - [AIRFLOW-4883] Kill hung file process managers (#5605) >>>> - [AIRFLOW-4929] Pretty print JSON Variables in UI (#5573) >>>> - [AIRFLOW-4884] Roll up import_errors in RBAC UI (#5516) >>>> - [AIRFLOW-4871] Allow creating DagRuns via RBAC UI (#5507) >>>> - [AIRFLOW-4591] Make default_pool a real pool (#5349) >>>> - [AIRFLOW-4844] Add optional is_paused_upon_creation argument to DAG >>> (#5473) >>>> - [AIRFLOW-4456] Add sub-classable BaseBranchOperator (#5231) >>>> - [AIRFLOW-4343] Show warning in UI if scheduler is not running (#5127) >>>> - [AIRFLOW-4739] Add ability to arbitrarily define kubernetes worker pod >>> labels (#5376) >>>> - [AIRFLOW-4348] Add GCP console link in BigQueryOperator (#5195) >>>> - [AIRFLOW-4306] Global operator extra links (#5094) >>>> - [AIRFLOW-4812] Add batch images annotation (#5433) >>>> - [AIRFLOW-4135] Add Google Cloud Build operator and hook (#5251) >>>> - [AIRFLOW-4781] Add the ability to specify ports in kubernetesOperator >>> (#5410) >>>> - [AIRFLOW-4521] Pause dag also pause its subdags (#5283) >>>> - [AIRFLOW-4738] Enforce exampleinclude for example DAGs (#5375) >>>> - [AIRFLOW-4326] Airflow AWS SQS Operator (#5110) >>>> - [AIRFLOW-3729] Support "DownwardAPI" in env variables for >>> KubernetesPodOperator (#4554) >>>> - [AIRFLOW-4585] Implement Kubernetes Pod Mutation Hook (#5359) >>>> - [AIRFLOW-161] New redirect route and extra links (#5059) >>>> - [AIRFLOW-4420] Backfill respects task_concurrency (#5221) >>>> - [AIRFLOW-4147] Add Operator to publish event to Redis (#4967) >>>> - [AIRFLOW-3359] Add option to pass customer encryption keys to Dataproc >>> (#4200) >>>> - [AIRFLOW-4318] Create Google Cloud Translate Speech Operator (#5102) >>>> - [AIRFLOW-3960] Adds Google Cloud Speech operators (#4780) >>>> - [AIRFLOW-1501] Add GoogleCloudStorageDeleteOperator (#5230) >>>> - [AIRFLOW-3672] Add support for Mongo DB DNS Seedlist Connection Format >>> (#4481) >>>> - [AIRFLOW-4397] add integrations docs manually for gcs sensors (#5204) >>>> - [AIRFLOW-4251] Instrument DagRun schedule delay (#5050) >>>> - [AIRFLOW-4118] instrument DagRun duration (#4946) >>>> - [AIRFLOW-4361] Fix flaky >>> test_integration_run_dag_with_scheduler_failure (#5182) >>>> - [AIRFLOW-4361] Fix flaky >>> test_integration_run_dag_with_scheduler_failure (#5140) >>>> - [AIRFLOW-4168] Create Google Cloud Video Intelligence Operators (#4985) >>>> - [AIRFLOW-4397] Add GCSUploadSessionCompleteSensor (#5166) >>>> - [AIRFLOW-4335] Add default num_retries to GCP connection (#5117) >>>> - [AIRFLOW-3808] Add cluster_fields to BigQueryHook's create_empty_table >>> (#4654) >>>> - [AIRFLOW-4362] Fix test_execution_limited_parallelism (#5141) >>>> - [AIRFLOW-4307] Backfill respects concurrency limit (#5128) >>>> - [AIRFLOW-4268] Add MsSqlToGoogleCloudStorageOperator (#5077) >>>> - [AIRFLOW-4169] Add Google Cloud Vision Detect Operators (#4986) >>>> - [AIRFLOW-XXX] Fix WS-2019-0032 (#5384) >>>> - [AIRFLOW-XXX] Fix CVE-2019-11358 (#5197) >>>> - [AIRFLOW-XXX] Change allowed version of Jinja2 to fix CVE-2019-10906 >>> (#5075) >>>> >>>> Improvement >>>> """"""""""" >>>> - [AIRFLOW-5022] Fix DockerHook for registries with port numbers (#5644) >>>> - [AIRFLOW-4961] Insert TaskFail.duration as int match DB schema column >>> type (#5593) >>>> - [AIRFLOW-5038] skip pod deleted log message when pod deletion is >>> disabled (#5656) >>>> - [AIRFLOW-5067] Update pagination symbols (#5682) >>>> - [AIRFLOW-4981][AIRFLOW-4788] Always use pendulum DateTimes in task >>> instance context (#5654) >>>> - [AIRFLOW-4880] Add success, failure and fail_on_empty params to >>> SqlSensor (#5488) >>>> - [AIRFLOW-3617] Add gpu limits option in configurations for Kube >>> executor and pod (#5643) >>>> - [AIRFLOW-4998] Run multiple queries in BigQueryOperator (#5619) >>>> - [AIRFLOW-4929] Improve display of JSON Variables in UI (#5641) >>>> - [AIRFLOW-4959] Add .hql support for the DataProcHiveOperator (#5591) >>>> - [AIRFLOW-4962] Fix Werkzeug v0.15 deprecation notice for >>> DispatcherMiddleware import (#5595) >>>> - [AIRFLOW-4797] Improve performance and behaviour of zombie detection >>> (#5511) >>>> - [AIRFLOW-4911] Silence the FORBIDDEN errors from the >>> KubernetesExecutor (#5547) >>>> - [AIRFLOW-3495] Validate one of query and query_uri passed to >>> DataProcSparkSqlOperator (#5510) >>>> - [AIRFLOW-4925] Improve css style for Variables Import file field >>> (#5552) >>>> - [AIRFLOW-4906] Improve debugging for the SparkSubmitHook (#5542) >>>> - [AIRFLOW-4904] unittest.cfg name and path can be overriden by setting >>> $AIRFLOW_TEST_CONFIG (#5540) >>>> - [AIRFLOW-4920] Use html.escape instead of cgi.escape to fix >>> DeprecationWarning (#5551) >>>> - [AIRFLOW-4919] DataProcJobBaseOperator dataproc_*_properties templated >>> (#5555) >>>> - [AIRFLOW-4478] Lazily instantiate default resources objects. (#5259) >>>> - [AIRFLOW-4564] AzureContainerInstance bugfixes and improvements (#5319) >>>> - [AIRFLOW-4237] Including Try Number of Task in Gantt Chart (#5037) >>>> - [AIRFLOW-4862] Allow directly using IP address as hostname for >>> webserver logs (#5501) >>>> - [AIRFLOW-4857] Add templated fields to SlackWebhookOperator (#5490) >>>> - [AIRFLOW-3502] Add celery config option for setting "pool" (#4308) >>>> - [AIRFLOW-3217] Button to toggle line wrapping in log and code views >>> (#4277) >>>> - [AIRFLOW-4491] Add a "Jump to end" button for logs (#5266) >>>> - [AIRFLOW-4422] Pool utilization stats (#5453) >>>> - [AIRFLOW-4805] Add py_file as templated field in >>> DataflowPythonOperator (#5451) >>>> - [AIRFLOW-4838] Surface Athena errors in AWSAthenaOperator (#5467) >>>> - [AIRFLOW-4831] conf.has_option no longer throws if section is missing. >>> (#5455) >>>> - [AIRFLOW-4829] More descriptive exceptions for EMR sensors (#5452) >>>> - [AIRFLOW-4414] AWSAthenaOperator: Push QueryExecutionID to XCom (#5276) >>>> - [AIRFLOW-4791] add "schema" keyword arg to SnowflakeOperator (#5415) >>>> - [AIRFLOW-4759] Don't error when marking sucessful run as failed (#5435) >>>> - [AIRFLOW-4716] Instrument dag loading time duration (#5350) >>>> - [AIRFLOW-3958] Support list tasks as upstream in chain (#4779) >>>> - [AIRFLOW-4409] Prevent task duration break by null value (#5178) >>>> - [AIRFLOW-4418] Add "failed only" option to task modal (#5193) >>>> - [AIRFLOW-4740] Accept string ``end_date`` in DAG default_args (#5381) >>>> - [AIRFLOW-4423] Improve date handling in mysql to gcs operator. (#5196) >>>> - [AIRFLOW-4447] Display task duration as human friendly format in UI >>> (#5218) >>>> - [AIRFLOW-4377] Remove needless object conversion in DAG.owner() (#5144) >>>> - [AIRFLOW-4766] Add autoscaling option for >>> DataprocClusterCreateOperator (#5425) >>>> - [AIRFLOW-4795] Upgrade alembic to latest release. (#5411) >>>> - [AIRFLOW-4793] Add signature_name to mlengine operator (#5417) >>>> - [AIRFLOW-3211] Reattach to GCP Dataproc jobs upon Airflow restart >>> (#4083) >>>> - [AIRFLOW-4750] Log identified zombie task instances (#5389) >>>> - [AIRFLOW-3870] STFPOperator: Update log level and return value (#4355) >>>> - [AIRFLOW-4759] Batch queries in set_state API. (#5403) >>>> - [AIRFLOW-2737] Restore original license header to >>> airflow.api.auth.backend.kerberos_auth >>>> - [AIRFLOW-3635] Fix incorrect logic in detele_dag (introduced in >>> PR#4406) (#4445) >>>> - [AIRFLOW-3599] Removed Dagbag from delete dag (#4406) >>>> - [AIRFLOW-4737] Increase and document celery queue name limit (#5383) >>>> - [AIRFLOW-4505] Correct Tag ALL for PY3 (#5275) >>>> - [AIRFLOW-4743] Add environment variables support to SSHOperator (#5385) >>>> - [AIRFLOW-4725] Fix setup.py PEP440 & Sphinx-PyPI-upload dependency >>> (#5363) >>>> - [AIRFLOW-3370] Add stdout output options to Elasticsearch task log >>> handler (#5048) >>>> - [AIRFLOW-4396] Provide a link to external Elasticsearch logs in UI. >>> (#5164) >>>> - [AIRFLOW-1381] Allow setting host temporary directory in >>> DockerOperator (#5369) >>>> - [AIRFLOW-4598] Task retries are not exhausted for K8s executor (#5347) >>>> - [AIRFLOW-4218] Support to Provide http args to K8executor while >>> calling k8 python client lib apis (#5060) >>>> - [AIRFLOW-4159] Add support for additional static pod labels for >>> K8sExecutor (#5134) >>>> - [AIRFLOW-4720] Allow comments in .airflowignore files. (#5355) >>>> - [AIRFLOW-4486] Add AWS IAM authenication in MySqlHook (#5334) >>>> - [AIRFLOW-4417] Add AWS IAM authenication for PostgresHook (#5223) >>>> - [AIRFLOW-3990] Compile regular expressions. (#4813) >>>> - [AIRFLOW-4572] Rename prepare_classpath() to prepare_syspath() (#5328) >>>> - [AIRFLOW-3869] Raise consistent exception in >>> AirflowConfigParser.getboolean (#4692) >>>> - [AIRFLOW-4571] Add headers to templated field for SimpleHttpOperator >>> (#5326) >>>> - [AIRFLOW-3867] Rename GCP's subpackage (#4690) >>>> - [AIRFLOW-3725] Add private_key to bigquery_hook get_pandas_df (#4549) >>>> - [AIRFLOW-4546] Upgrade google-cloud-bigtable. (#5307) >>>> - [AIRFLOW-4519] Optimise operator classname sorting in views (#5282) >>>> - [AIRFLOW-4503] Support fully pig options (#5271) >>>> - [AIRFLOW-4468] add sql_alchemy_max_overflow parameter (#5249) >>>> - [AIRFLOW-4467] Add dataproc_jars to templated fields in Dataproc oper… >>> (#5248) >>>> - [AIRFLOW-4381] Use get_direct_relative_ids get task relatives (#5147) >>>> - [AIRFLOW-3624] Add masterType parameter to MLEngineTrainingOperator >>> (#4428) >>>> - [AIRFLOW-3143] Support Auto-Zone in DataprocClusterCreateOperator >>> (#5169) >>>> - [AIRFLOW-3874] Improve BigQueryHook.run_with_configuration's location >>> support (#4695) >>>> - [AIRFLOW-4399] Avoid duplicated os.path.isfile() check in >>> models.dagbag (#5165) >>>> - [AIRFLOW-4031] Allow for key pair auth in snowflake hook (#4875) >>>> - [AIRFLOW-3901] add role as optional config parameter for SnowflakeHook >>> (#4721) >>>> - [AIRFLOW-3455] add region in snowflake connector (#4285) >>>> - [AIRFLOW-4073] add template_ext for AWS Athena operator (#4907) >>>> - [AIRFLOW-4093] AWSAthenaOperator: Throw exception if job >>> failed/cancelled/reach max retries (#4919) >>>> - [AIRFLOW-4356] Add extra RuntimeEnvironment keys to DataFlowHook >>> (#5149) >>>> - [AIRFLOW-4337] Fix docker-compose deprecation warning in CI (#5119) >>>> - [AIRFLOW-3603] QuboleOperator: Remove SQLCommand from SparkCmd >>> documentation (#4411) >>>> - [AIRFLOW-4328] Fix link to task instances from Pool page (#5124) >>>> - [AIRFLOW-4255] Make GCS Hook Backwards compatible (#5089) >>>> - [AIRFLOW-4103] Allow uppercase letters in dataflow job names (#4925) >>>> - [AIRFLOW-4255] Replace Discovery based api with client based for GCS >>> (#5054) >>>> - [AIRFLOW-4311] Remove sleep in localexecutor (#5096) >>>> - [AIRFLOW-2836] Minor improvement-contrib.sensors.FileSensor (#3674) >>>> - [AIRFLOW-4104] Add type annotations to common classes. (#4926) >>>> - [AIRFLOW-3910] Raise exception explicitly in Connection.get_hook() >>> (#4728) >>>> - [AIRFLOW-3322] Update QuboleHook to fetch args dynamically from >>> qds_sdk (#4165) >>>> - [AIRFLOW-4565] instrument celery executor (#5321) >>>> - [AIRFLOW-4573] Import airflow_local_settings after prepare_classpath >>> (#5330) >>>> - [AIRFLOW-4448] Don't bake ENV and _cmd into tmp config for non-sudo >>> (#4050) >>>> - [AIRFLOW-4295] Make ``method`` attribute case insensitive in HttpHook >>> (#5313) >>>> - [AIRFLOW-3703] Add dnsPolicy option for KubernetesPodOperator (#4520) >>>> - [AIRFLOW-3057] add prev_*_date_success to template context (#5372) >>>> - [AIRFLOW-4336] Stop showing entire GCS files bytes in log for >>> gcs_download_operator (#5151) >>>> - [AIRFLOW-4528] Cancel DataProc task on timeout (#5293) >>>> >>>> Bug fixes >>>> """"""""" >>>> - [AIRFLOW-4289] fix spark_binary argument being ignored in >>> SparkSubmitHook (#5564) >>>> - [AIRFLOW-5075] Let HttpHook handle connections with empty host fields >>> (#5686) >>>> - [AIRFLOW-4822] Fix bug where parent-dag task instances are wrongly >>> cleared when using subdags (#5444) >>>> - [AIRFLOW-5050] Correctly delete FAB permission m2m objects in >>> ``airflow sync_perms`` (#5679) >>>> - [AIRFLOW-5030] fix env var expansion for config key contains __ (#5650) >>>> - [AIRFLOW-4590] changing log level to be proper library to suppress >>> warning in WinRM (#5337) >>>> - [AIRFLOW-4451] Allow named tuples to be templated (#5673) >>>> - [AIRFLOW-XXX] Fix bug where Kube pod limts were not applied (requests >>> were, but not limits) (#5657) >>>> - [AIRFLOW-4775] Fix incorrect parameter order in GceHook (#5613) >>>> - [AIRFLOW-4995] Fix DB initialisation on MySQL >=8.0.16 (#5614) >>>> - [AIRFLOW-4934] Fix ProxyFix due to Werkzeug upgrade (#5563) (#5571) >>>> - [AIRFLOW-4136] fix key_file of hook is overwritten by SSHHook >>> connection (#5558) >>>> - [AIRFLOW-4587] Replace self.conn with self.get_conn() in AWSAthenaHook >>> (#5545) >>>> - [AIRFLOW-1740] Fix xcom creation and update via UI (#5530) (#5531) >>>> - [AIRFLOW-4900] Resolve incompatible version of Werkzeug (#5535) >>>> - [AIRFLOW-4510] Don't mutate default_args during DAG initialization >>> (#5277) >>>> - [AIRFLOW-3360] Make the DAGs search respect other querystring >>> parameters with url-search-params-polyfill for IE support (#5503) >>>> - [AIRFLOW-4896] Make KubernetesExecutorConfig's default args immutable >>> (#5534) >>>> - [AIRFLOW-4494] Remove ``shell=True`` in DaskExecutor (#5273) >>>> - [AIRFLOW-4890] Fix Log link in TaskInstance's View for Non-RBAC (#5525) >>>> - [AIRFLOW-4892] Fix connection creation via UIs (#5527) >>>> - [AIRFLOW-4406] Fix a method name typo: NullFernet.decrpyt to decrypt >>> (#5509) >>>> - [AIRFLOW-4849] Add gcp_conn_id to cloudsqldatabehook class to use >>> correctly CloudSqlProxyRunner class (#5478) >>>> - [AIRFLOW-4769] Pass gcp_conn_id to BigtableHook (#5445) >>>> - [AIRFLOW-4524] Fix incorrect field names in view for Mark >>> Success/Failure (#5486) >>>> - [AIRFLOW-3671] Remove arg ``replace`` of MongoToS3Operator from >>> ``kwargs`` (#4480) >>>> - [AIRFLOW-4845] Fix bug where runAsUser 0 doesn't get set in k8s >>> security context (#5474) >>>> - [AIRFLOW-4354] Fix exception in "between" date filter in classic UI >>> (#5480) >>>> - [AIRFLOW-4587] Replace self.conn with self.get_conn() in AWSAthenaHook >>> (#5462) >>>> - [AIRFLOW-4516] K8s runAsUser and fsGroup cannot be strings (#5429) >>>> - [AIRFLOW-4298] Stop Scheduler repeatedly warning "connection >>> invalidated" (#5470) >>>> - [AIRFLOW-4559] JenkinsJobTriggerOperator bugfix (#5318) >>>> - [AIRFLOW-4841] Pin Sphinx AutoApi to 1.0.0 (#5468) >>>> - [AIRFLOW-4479] Include s3_overwrite kwarg in load_bytes method (#5312) >>>> - [AIRFLOW-3746] Fix DockerOperator missing container exit (#4583) >>>> - [AIRFLOW-4233] Remove Template Extension from Bq to GCS Operator >>> (#5456) >>>> - [AIRFLOW-2141][AIRFLOW-3157][AIRFLOW-4170] Serialize non-str value by >>> JSON when importing Variables (#4991) >>>> - [AIRFLOW-4826] Remove warning from ``airflow resetdb`` command (#5447) >>>> - [AIRFLOW-4148] Fix editing DagRuns when clicking state column (#5436) >>>> - [AIRFLOW-4455] dag_details broken for subdags in RBAC UI (#5234) >>>> - [AIRFLOW-2955] Fix kubernetes pod operator to set requests and limits >>> on task pods (#4551) >>>> - [AIRFLOW-4459] Fix wrong DAG count in /home page when DAG count is >>> zero (#5235) >>>> - [AIRFLOW-3876] AttributeError: module 'distutils' has no attribute >>> 'util' >>>> - [AIRFLOW-4146] Fix CgroupTaskRunner errors (#5224) >>>> - [AIRFLOW-4524] Fix bug with "Ignore \*" toggles in RBAC mode (#5378) >>>> - [AIRFLOW-4765] Fix DataProcPigOperator execute method (#5426) >>>> - [AIRFLOW-4798] obviate interdependencies for dagbag and TI tests >>> (#5422) >>>> - [AIRFLOW-4800] fix GKEClusterHook ctor calls (#5424) >>>> - [AIRFLOW-4799] don't mutate self.env in BashOperator execute method >>> (#5421) >>>> - [AIRFLOW-4393] Add retry logic when fetching pod status and/or logs in >>> KubernetesPodOperator (#5284) >>>> - [AIRFLOW-4174] Fix HttpHook run with backoff (#5213) >>>> - [AIRFLOW-4463] Handle divide-by-zero errors in short retry intervals >>> (#5243) >>>> - [AIRFLOW-2614] Speed up trigger_dag API call when lots of DAGs in >>> system >>>> - [AIRFLOW-4756] add ti.state to ti.start_date as criteria for gantt >>> (#5399) >>>> - [AIRFLOW-4760] Fix zip-packaged DAGs disappearing from DagBag when >>> reloaded (#5404) >>>> - [AIRFLOW-4731] Fix GCS hook with google-storage-client 1.16 (#5368) >>>> - [AIRFLOW-3506] use match_phrase to query log_id in elasticsearch >>> (#4342) >>>> - [AIRFLOW-4084] fix ElasticSearch log download (#5177) >>>> - [AIRFLOW-4501] Register pendulum datetime converter for sqla+pymysql >>> (#5190) >>>> - [AIRFLOW-986] HiveCliHook ignores 'proxy_user' value in a connection's >>> extra parameter (#5305) >>>> - [AIRFLOW-4442] fix hive_tblproperties in HiveToDruidTransfer (#5211) >>>> - [AIRFLOW-4557] Add gcp_conn_id parameter to get_sqlproxy_runner() of >>> CloudSqlDatabaseHook (#5314) >>>> - [AIRFLOW-4545] Upgrade FAB to latest version (#4955) >>>> - [AIRFLOW-4492] Change Dataproc Cluster operators to poll Operations >>> (#5269) >>>> - [AIRFLOW-4452] Webserver and Scheduler keep crashing because of >>> slackclient update (#5225) >>>> - [AIRFLOW-4450] Fix request arguments in has_dag_access (#5220) >>>> - [AIRFLOW-4434] Support Impala with the HiveServer2Hook (#5206) >>>> - [AIRFLOW-3449] Write local dag parsing logs when remote logging >>> enabled. (#5175) >>>> - [AIRFLOW-4300] Fix graph modal call when DAG has not yet run (#5185) >>>> - [AIRFLOW-4401] Use managers for Queue synchronization (#5200) >>>> - [AIRFLOW-3626] Fixed triggering DAGs contained within zip files (#4439) >>>> - [AIRFLOW-3720] Fix missmatch while comparing GCS and S3 files (#4766) >>>> - [AIRFLOW-4403] search by ``dag_id`` or ``owners`` in UI (#5184) >>>> - [AIRFLOW-4308] Fix TZ-loop around DST on python 3.6+ (#5095) >>>> - [AIRFLOW-4324] fix DAG fuzzy search in RBAC UI (#5131) >>>> - [AIRFLOW-4297] Temporary hot fix on manage_slas() for 1.10.4 release >>> (#5150) >>>> - [AIRFLOW-4299] Upgrade to Celery 4.3.0 to fix crashing workers (#5116) >>>> - [AIRFLOW-4291] Correctly render doc_md in DAG graph page (#5121) >>>> - [AIRFLOW-4310] Fix incorrect link on Dag Details page (#5122) >>>> - [AIRFLOW-4331] Correct filter for Null-state runs from Dag Detail page >>> (#5123) >>>> - [AIRFLOW-4294] Fix missing dag & task runs in UI dag_id contains a dot >>> (#5111) >>>> - [AIRFLOW-4332] Upgrade sqlalchemy to remove security Vulnerability >>> (#5113) >>>> - [AIRFLOW-4312] Add template_fields & template_ext to BigQueryCheckO… >>> (#5097) >>>> - [AIRFLOW-4293] Fix downgrade in >>> d4ecb8fbee3_add_schedule_interval_to_dag.py (#5086) >>>> - [AIRFLOW-4267] Fix TI duration in Graph View (#5071) >>>> - [AIRFLOW-4163] IntervalCheckOperator supports relative diff and not >>> ignore 0 (#4983) >>>> - [AIRFLOW-3938] QuboleOperator Fixes and Support for SqlCommand (#4832) >>>> - [AIRFLOW-2903] Change default owner to ``airflow`` (#4151) >>>> - [aIRFLOW-4136] Fix overwrite of key_file by constructor (#5155) >>>> - [AIRFLOW-3241] Remove Invalid template ext in GCS Sensors (#4076) >>>> >>>> Misc/Internal >>>> """"""""""""" >>>> - [AIRFLOW-4338] Change k8s pod_request_factory to use yaml safe_load >>> (#5120) >>>> - [AIRFLOW-4869] Reorganize sql to gcs operators. (#5504) >>>> - [AIRFLOW-5021] move gitpython into setup_requires (#5640) >>>> - [AIRFLOW-4583] Fixes type error in GKEPodOperator (#5612) >>>> - [AIRFLOW-4116] Dockerfile now supports CI image build on DockerHub >>> (#4937) >>>> - [AIRFLOW-4115] Multi-staging Aiflow Docker image (#4936) >>>> - [AIRFLOW-4963] Avoid recreating task context (#5596) >>>> - [AIRFLOW-4865] Add context manager to set temporary config values in >>> tests. (#5569) >>>> - [AIRFLOW-4937] Fix lodash security issue with version below 4.17.13 >>> (#5572) (used only in build-pipeline, not runtime) >>>> - [AIRFLOW-4868] Fix typo in kubernetes/docker/build.sh (#5505) >>>> - [AIRFLOW-4211] Add tests for WebHDFSHook (#5015) >>>> - [AIRFLOW-4320] Add tests for SegmentTrackEventOperator (#5104) >>>> - [AIRFLOW-4319] Add tests for Bigquery related Operators (#5101) >>>> - [AIRFLOW-4014] Change DatastoreHook and add tests (#4842) >>>> - [AIRFLOW-4322] Add test for VerticaOperator (#5107) >>>> - [AIRFLOW-4323] Add 2 tests for WinRMOperator (#5108) >>>> - [AIRFLOW-3677] Improve CheckOperator test coverage (#4756) >>>> - [AIRFLOW-4659] Fix pylint problems for api module (#5398) >>>> - [AIRFLOW-4358] Speed up test_jobs by not running tasks (#5162) >>>> - [AIRFLOW-4394] Don't test behaviour of BackfillJob from CLI tests >>> (#5160) >>>> - [AIRFLOW-3471] Move XCom out of models.py (#4629) >>>> - [AIRFLOW-4379] Remove duplicate code & Add validation in gcs_to_gcs.py >>> (#5145) >>>> - [AIRFLOW-4259] Move models out of models.py (#5056) >>>> - [AIRFLOW-XXX] Speed up building of Cassanda module on Travis (#5233) >>>> - [AIRFLOW-4535] Break jobs.py into multiple files (#5303) >>>> - [AIRFLOW-1464] Batch update task_instance state (#5323) >>>> - [AIRFLOW-4554] Test for sudo command, add some other test docs (#5310) >>>> - [AIRFLOW-4419] Refine concurrency check in scheduler (#5194) >>>> - [AIRFLOW-4269] Minor acceleration of jobs._process_task_instances() >>> (#5076) >>>> - [AIRFLOW-4341] Remove ``View.render()`` already exists in fab.BaseView >>> (#5125) >>>> - [AIRFLOW-4342] Use @cached_property instead of re-implementing it each >>> time (#5126) >>>> - [AIRFLOW-4256] Remove noqa from migrations (#5055) >>>> - [AIRFLOW-4034] Remove unnecessary string formatting with >>> ``**locals()`` (#4861) >>>> - [AIRFLOW-3944] Remove code smells (#4762) >>>> >>>> Doc-only changes >>>> """""""""""""""" >>>> - [AIRFLOW-XXX] Add missing doc for annotations param of >>> KubernetesPodOperator (#5666) >>>> - [AIRFLOW-XXX] Fix typos in CONTRIBUTING.md (#5626) >>>> - [AIRFLOW-XXX] Correct BaseSensorOperator docs (#5562) >>>> - [AIRFLOW-4926] Fix example dags where its start_date is >>> datetime.utcnow() (#5553) >>>> - [AIRFLOW-4860] Remove Redundant Information in Example Dags (#5497) >>>> - [AIRFLOW-4767] Fix errors in the documentation of Dataproc Operator >>> (#5487) >>>> - [AIRFLOW-1684] Branching based on XCom variable (Docs) (#4365) >>>> - [AIRFLOW-3341] FAQ return DAG object example (#4605) >>>> - [AIRFLOW-4433] Add missing type in DockerOperator doc string (#5205) >>>> - [AIRFLOW-4321] Replace incorrect info of Max Size limit of GCS Object >>> Size (#5106) >>>> - [AIRFLOW-XXX] Add information about user list (#5341) >>>> - [AIRFLOW-XXX] Clarify documentation related to autodetect parameter in >>> GCS_to_BQ Op (#5294) >>>> - [AIRFLOW-XXX] Remove mention of pytz compatibility from timezone >>> documentation (#5316) >>>> - [AIRFLOW-XXX] Add missing docs for GoogleCloudStorageDeleteOperator >>> (#5274) >>>> - [AIRFLOW-XXX] Remove incorrect note about Scopes of GCP connection >>> (#5242) >>>> - [AIRFLOW-XXX] Fix mistakes in docs of Dataproc operators (#5192) >>>> - [AIRFLOW-XXX] Link to correct class for timedelta in macros.rst (#5226) >>>> - [AIRFLOW-XXX] Add Kamil as committer (#5216) >>>> - [AIRFLOW-XXX] Add Joshua and Kevin as committer (#5207) >>>> - [AIRFLOW-XXX] Reduce log spam in tests (#5174) >>>> - [AIRFLOW-XXX] Speed up tests for PythonSensor (#5158) >>>> - [AIRFLOW-XXX] Add Bas Harenslak to committer list (#5157) >>>> - [AIRFLOW-XXX] Add Jarek Potiuk to commiter list (#5132) >>>> - [AIRFLOW-XXX] Update docstring for SchedulerJob (#5105) >>>> - [AIRFLOW-XXX] Fix docstrings for CassandraToGoogleCloudStorageOperator >>> (#5103) >>>> - [AIRFLOW-XXX] update SlackWebhookHook and SlackWebhookOperator >>> docstring (#5074) >>>> - [AIRFLOW-XXX] Ignore python files under node_modules in docs (#5063) >>>> - [AIRFLOW-XXX] Build a universal wheel with LICNESE files (#5052) >>>> - [AIRFLOW-XXX] Fix docstrings of SQSHook (#5099) >>>> - [AIRFLOW-XXX] Use Py3.7 on readthedocs >>>> - [AIRFLOW-4446] Fix typos (#5217) >>>> >>>> >>> >>> >
