Hi all, Just a small follow-up to this. Is there a way to store logs exclusively on remote storage? We would like to avoid local logs from growing and taking up too much space, with the current set-up it seems local logs still exist and they are also written remotely.
On Wed, Dec 20, 2017 at 11:44 AM, Kevin Lam <[email protected]> wrote: > I got it to work, it seems i had mismatched some code ( > airflow/config_templates/airflow_local_settings.py) from the master > branch in the v1-9-stable branch. Thanks for your help everyone! > > On Wed, Dec 20, 2017 at 11:01 AM, Kevin Lam <[email protected]> wrote: > >> Hi Ash, >> >> That run was at the head of master branch in github: >> >> https://github.com/apache/incubator-airflow/blob/master/airf >> low/utils/log/gcs_task_handler.py#L144 >> >> >> On Wed, Dec 20, 2017 at 10:54 AM, Ash Berlin-Taylor < >> [email protected]> wrote: >> >>> What version are you on? I can't match up the line numbers in this stack >>> trace to either 1.9.0rc8 or 1.9.0rc2 -- both of which show the 'if old_log >>> else log' on line 157 >>> >>> -ash >>> >>> >>> > On 20 Dec 2017, at 15:25, Kevin Lam <[email protected]> wrote: >>> > >>> > Thanks Bolke and Feng! >>> > >>> > I seem to have a working connection with GCS but it seems there some >>> error >>> > occuring in the gcs_task_handler in airflow: >>> > >>> > Traceback (most recent call last): >>> > File "/usr/local/bin/airflow", line 27, in <module> >>> > args.func(args) >>> > File "/usr/local/lib/python3.5/dist-packages/airflow/bin/cli.py", >>> line >>> > 423, in run >>> > logging.shutdown() >>> > File "/usr/lib/python3.5/logging/__init__.py", line 1882, in shutdown >>> > h.close() >>> > File >>> > "/usr/local/lib/python3.5/dist-packages/airflow/utils/log/gc >>> s_task_handler.py", >>> > line 87, in close >>> > self.gcs_write(log, remote_loc) >>> > File >>> > "/usr/local/lib/python3.5/dist-packages/airflow/utils/log/gc >>> s_task_handler.py", >>> > line 144, in gcs_write >>> > log = '\n'.join([old_log, log]) if old_log else log >>> > UnboundLocalError: local variable 'old_log' referenced before >>> assignment >>> > >>> > I believe the connection is working because the tasks are getting a 404 >>> > instead of 403 when trying to read from remote logs, but they aren't >>> being >>> > written because of the above error. >>> > >>> > Eg. >>> > >>> > *** Unable to read remote log from >>> > gs://<mybucket>/<...>/2017-12-20T15:21:23.704614+00:00/1.log >>> > *** <HttpError 404 when requesting >>> > https://www.googleapis.com/storage/v1/b/<mybucket>/o/<...>F2 >>> 017-12-20T15%3A21%3A23.704614%2B00%3A00%2F1.log?alt=media >>> > returned "Not Found"> >>> > >>> > >>> > On Wed, Dec 20, 2017 at 1:48 AM, Bolke de Bruin <[email protected]> >>> wrote: >>> > >>> >> Both will/should work, master is just cleaner and more manageable. >>> >> >>> >> B. >>> >> >>> >> Verstuurd vanaf mijn iPad >>> >> >>> >>> Op 19 dec. 2017 om 23:44 heeft Kevin Lam <[email protected]> het >>> >> volgende geschreven: >>> >>> >>> >>> Looks like it might be related to >>> >>> https://github.com/apache/incubator-airflow/commit/ >>> >> 02ff8ae35dd16e6f23d29d7b24a5fb9c09d0b7a4? >>> >>> Why isn't this fix on the v1-9 branches? Should I be using master >>> >> instead? >>> >>> >>> >>>> On Tue, Dec 19, 2017 at 5:37 PM, Kevin Lam <[email protected]> >>> >> wrote: >>> >>>> >>> >>>> Hi Feng, >>> >>>> >>> >>>> Thanks for your help! Got it, will try to push on the python based >>> >> logging >>> >>>> config. >>> >>>> >>> >>>> I'm trying to set-up the GCS logging on airflow v1-9-stable and my >>> >>>> logging_config.py seems to be causing a python import error, caused >>> by >>> >>>> 'from airflow import configuration' >>> >>>> >>> >>>> "Initialize database... >>> >>>> Unable to load the config, contains a configuration error. >>> >>>> Traceback (most recent call last): >>> >>>> File "/usr/lib/python3.5/logging/config.py", line 384, in resolve >>> >>>> self.importer(used) >>> >>>> ImportError: No module named 'airflow.utils.log.logging_ >>> >> mixin.RedirectStdHandler'; >>> >>>> 'airflow.utils.log.logging_mixin' is not a package >>> >>>> >>> >>>> The above exception was the direct cause of the following exception: >>> >>>> >>> >>>> Traceback (most recent call last): >>> >>>> File "/usr/lib/python3.5/logging/config.py", line 558, in configure >>> >>>> handler = self.configure_handler(handlers[name]) >>> >>>> File "/usr/lib/python3.5/logging/config.py", line 708, in >>> >>>> configure_handler >>> >>>> klass = self.resolve(cname) >>> >>>> File "/usr/lib/python3.5/logging/config.py", line 391, in resolve >>> >>>> raise v >>> >>>> File "/usr/lib/python3.5/logging/config.py", line 384, in resolve >>> >>>> self.importer(used) >>> >>>> ValueError: Cannot resolve 'airflow.utils.log.logging_ >>> >> mixin.RedirectStdHandler': >>> >>>> No module named 'airflow.utils.log.logging_mix >>> in.RedirectStdHandler'; >>> >>>> 'airflow.utils.log.logging_mixin' is not a package >>> >>>> >>> >>>> During handling of the above exception, another exception occurred: >>> >>>> >>> >>>> Traceback (most recent call last): >>> >>>> File "/usr/local/bin/airflow", line 16, in <module> >>> >>>> from airflow import configuration >>> >>>> File "/usr/local/lib/python3.5/dist-packages/airflow/__init__.py", >>> >> line >>> >>>> 31, in <module> >>> >>>> from airflow import settings >>> >>>> File "/usr/local/lib/python3.5/dist-packages/airflow/settings.py", >>> >> line >>> >>>> 148, in <module> >>> >>>> configure_logging() >>> >>>> File "/usr/local/lib/python3.5/dist-packages/airflow/logging_ >>> >> config.py", >>> >>>> line 75, in configure_logging >>> >>>> raise e >>> >>>> File "/usr/local/lib/python3.5/dist-packages/airflow/logging_ >>> >> config.py", >>> >>>> line 70, in configure_logging >>> >>>> dictConfig(logging_config) >>> >>>> File "/usr/lib/python3.5/logging/config.py", line 795, in >>> dictConfig >>> >>>> dictConfigClass(config).configure() >>> >>>> File "/usr/lib/python3.5/logging/config.py", line 566, in configure >>> >>>> '%r: %s' % (name, e)) >>> >>>> ValueError: Unable to configure handler 'console': Cannot resolve >>> >>>> 'airflow.utils.log.logging_mixin.RedirectStdHandler': No module >>> named >>> >>>> 'airflow.utils.log.logging_mixin.RedirectStdHandler'; >>> >>>> 'airflow.utils.log.logging_mixin' is not a package >>> >>>> HTTP/1.1 200 OK >>> >>>> Unable to load the config, contains a configuration error. >>> >>>> Traceback (most recent call last): >>> >>>> File "/usr/lib/python3.5/logging/config.py", line 384, in resolve >>> >>>> self.importer(used) >>> >>>> ImportError: No module named 'airflow.utils.log.logging_ >>> >> mixin.RedirectStdHandler'; >>> >>>> 'airflow.utils.log.logging_mixin' is not a package >>> >>>> >>> >>>> The above exception was the direct cause of the following exception: >>> >>>> >>> >>>> Traceback (most recent call last): >>> >>>> File "/usr/lib/python3.5/logging/config.py", line 558, in configure >>> >>>> handler = self.configure_handler(handlers[name]) >>> >>>> File "/usr/lib/python3.5/logging/config.py", line 708, in >>> >>>> configure_handler >>> >>>> klass = self.resolve(cname) >>> >>>> File "/usr/lib/python3.5/logging/config.py", line 391, in resolve >>> >>>> raise v >>> >>>> File "/usr/lib/python3.5/logging/config.py", line 384, in resolve >>> >>>> self.importer(used) >>> >>>> ValueError: Cannot resolve 'airflow.utils.log.logging_ >>> >> mixin.RedirectStdHandler': >>> >>>> No module named 'airflow.utils.log.logging_mix >>> in.RedirectStdHandler'; >>> >>>> 'airflow.utils.log.logging_mixin' is not a package >>> >>>> >>> >>>> During handling of the above exception, another exception occurred: >>> >>>> >>> >>>> Traceback (most recent call last): >>> >>>> File "/usr/local/bin/airflow", line 16, in <module> >>> >>>> from airflow import configuration >>> >>>> File "/usr/local/lib/python3.5/dist-packages/airflow/__init__.py", >>> >> line >>> >>>> 31, in <module> >>> >>>> from airflow import settings >>> >>>> File "/usr/local/lib/python3.5/dist-packages/airflow/settings.py", >>> >> line >>> >>>> 148, in <module> >>> >>>> configure_logging() >>> >>>> File "/usr/local/lib/python3.5/dist-packages/airflow/logging_ >>> >> config.py", >>> >>>> line 75, in configure_logging >>> >>>> raise e >>> >>>> File "/usr/local/lib/python3.5/dist-packages/airflow/logging_ >>> >> config.py", >>> >>>> line 70, in configure_logging >>> >>>> dictConfig(logging_config) >>> >>>> File "/usr/lib/python3.5/logging/config.py", line 795, in >>> dictConfig >>> >>>> dictConfigClass(config).configure() >>> >>>> File "/usr/lib/python3.5/logging/config.py", line 566, in configure >>> >>>> '%r: %s' % (name, e)) >>> >>>> ValueError: Unable to configure handler 'console': Cannot resolve >>> >>>> 'airflow.utils.log.logging_mixin.RedirectStdHandler': No module >>> named >>> >>>> 'airflow.utils.log.logging_mixin.RedirectStdHandler'; >>> >>>> 'airflow.utils.log.logging_mixin' is not a package" >>> >>>> >>> >>>> Have you encountered this before? >>> >>>> >>> >>>> On Mon, Dec 18, 2017 at 8:53 PM, Feng Lu <[email protected] >>> > >>> >>>> wrote: >>> >>>> >>> >>>>> Hi Kevin, >>> >>>>> >>> >>>>> Kindly see my reply inline: >>> >>>>> >>> >>>>>> On Mon, Dec 18, 2017 at 3:28 PM, Kevin Lam <[email protected] >>> > >>> >> wrote: >>> >>>>>> >>> >>>>>> Hi, >>> >>>>>> >>> >>>>>> I'm trying to get airflow to use GCS for logging purposes and had >>> a >>> >> few >>> >>>>>> questions. >>> >>>>>> >>> >>>>>> We're currently using Airflow 1.9rc2, running in a Kubernetes >>> Airflow >>> >>>>>> deployment (similar to https://github.com/mumoshu/kube-airflow) >>> >>>>>> >>> >>>>>> 1/ Seems like the logging code has been going through some >>> changes in >>> >>>>> the >>> >>>>>> recent versions. What's the correct way to set up GCS for >>> logging? Is >>> >>>>> it by >>> >>>>>> just specifying remote_base_log_folder and remote_log_conn_id in >>> >>>>>> airflow.cfg? Or by following this guide: >>> >>>>>> http://airflow.readthedocs.io/en/latest/integration.html#gcp, >>> using >>> >> the >>> >>>>>> python based logging config? Is there an Airflow version that we >>> >> should >>> >>>>> use >>> >>>>>> to be most stable? >>> >>>>>> >>> >>>>> The python based logging config is the right place to make >>> changes, in >>> >> our >>> >>>>> test setup, we override the airflow_local_settings.py similarly to >>> the >>> >>>>> link >>> >>>>> you pasted. >>> >>>>> You may also want to config: [core]task_log_reader = gcs.task >>> >>>>> >>> >>>>> >>> >>>>>> >>> >>>>>> 2/ Is there a way to encode the connection for GCS in a file so >>> that >>> >> one >>> >>>>>> doesn't have to open the webserver and create it from the admin >>> panel? >>> >>>>> It'd >>> >>>>>> be nice if the GCS connection would be automatically created. >>> >>>>>> >>> >>>>> Unfortunately GCS connection ties to some GCP project and is >>> >> impossible to >>> >>>>> pre-populate. >>> >>>>> Airflow1.9 should fix the gcp connection type issue ( >>> >>>>> https://github.com/apache/incubator-airflow/commit/2f107d8a3 >>> >>>>> 0910fd025774004d5c4c95407ed55c5), >>> >>>>> so you can use airflow connections CLI directly. >>> >>>>> >>> >>>>> >>> >>>>>> >>> >>>>>> Thanks in advance for your help! >>> >>>>>> >>> >>>>> >>> >>>> >>> >>>> >>> >> >>> >>> >> >
