Hi all,

Just a small follow-up to this. Is there a way to store logs exclusively on
remote storage? We would like to avoid local logs from growing and taking
up too much space, with the current set-up it seems local logs still exist
and they are also written remotely.

On Wed, Dec 20, 2017 at 11:44 AM, Kevin Lam <[email protected]> wrote:

> I got it to work, it seems i had mismatched some code (
> airflow/config_templates/airflow_local_settings.py) from the master
> branch in the v1-9-stable branch. Thanks for your help everyone!
>
> On Wed, Dec 20, 2017 at 11:01 AM, Kevin Lam <[email protected]> wrote:
>
>> Hi Ash,
>>
>> That run was at the head of master branch in github:
>>
>> https://github.com/apache/incubator-airflow/blob/master/airf
>> low/utils/log/gcs_task_handler.py#L144
>>
>>
>> On Wed, Dec 20, 2017 at 10:54 AM, Ash Berlin-Taylor <
>> [email protected]> wrote:
>>
>>> What version are you on? I can't match up the line numbers in this stack
>>> trace to either 1.9.0rc8 or 1.9.0rc2 -- both of which show the 'if old_log
>>> else log' on line 157
>>>
>>> -ash
>>>
>>>
>>> > On 20 Dec 2017, at 15:25, Kevin Lam <[email protected]> wrote:
>>> >
>>> > Thanks Bolke and Feng!
>>> >
>>> > I seem to have a working connection with GCS but it seems there some
>>> error
>>> > occuring in the gcs_task_handler in airflow:
>>> >
>>> > Traceback (most recent call last):
>>> >  File "/usr/local/bin/airflow", line 27, in <module>
>>> >    args.func(args)
>>> >  File "/usr/local/lib/python3.5/dist-packages/airflow/bin/cli.py",
>>> line
>>> > 423, in run
>>> >    logging.shutdown()
>>> >  File "/usr/lib/python3.5/logging/__init__.py", line 1882, in shutdown
>>> >    h.close()
>>> >  File
>>> > "/usr/local/lib/python3.5/dist-packages/airflow/utils/log/gc
>>> s_task_handler.py",
>>> > line 87, in close
>>> >    self.gcs_write(log, remote_loc)
>>> >  File
>>> > "/usr/local/lib/python3.5/dist-packages/airflow/utils/log/gc
>>> s_task_handler.py",
>>> > line 144, in gcs_write
>>> >    log = '\n'.join([old_log, log]) if old_log else log
>>> > UnboundLocalError: local variable 'old_log' referenced before
>>> assignment
>>> >
>>> > I believe the connection is working because the tasks are getting a 404
>>> > instead of 403 when trying to read from remote logs, but they aren't
>>> being
>>> > written because of the above error.
>>> >
>>> > Eg.
>>> >
>>> > *** Unable to read remote log from
>>> > gs://<mybucket>/<...>/2017-12-20T15:21:23.704614+00:00/1.log
>>> > *** <HttpError 404 when requesting
>>> > https://www.googleapis.com/storage/v1/b/<mybucket>/o/<...>F2
>>> 017-12-20T15%3A21%3A23.704614%2B00%3A00%2F1.log?alt=media
>>> > returned "Not Found">
>>> >
>>> >
>>> > On Wed, Dec 20, 2017 at 1:48 AM, Bolke de Bruin <[email protected]>
>>> wrote:
>>> >
>>> >> Both will/should work, master is just cleaner and more manageable.
>>> >>
>>> >> B.
>>> >>
>>> >> Verstuurd vanaf mijn iPad
>>> >>
>>> >>> Op 19 dec. 2017 om 23:44 heeft Kevin Lam <[email protected]> het
>>> >> volgende geschreven:
>>> >>>
>>> >>> Looks like it might be related to
>>> >>> https://github.com/apache/incubator-airflow/commit/
>>> >> 02ff8ae35dd16e6f23d29d7b24a5fb9c09d0b7a4?
>>> >>> Why isn't this fix on the v1-9 branches? Should I be using master
>>> >> instead?
>>> >>>
>>> >>>> On Tue, Dec 19, 2017 at 5:37 PM, Kevin Lam <[email protected]>
>>> >> wrote:
>>> >>>>
>>> >>>> Hi Feng,
>>> >>>>
>>> >>>> Thanks for your help! Got it, will try to push on the python based
>>> >> logging
>>> >>>> config.
>>> >>>>
>>> >>>> I'm trying to set-up the GCS logging on airflow v1-9-stable and my
>>> >>>> logging_config.py seems to be causing a python import error, caused
>>> by
>>> >>>> 'from airflow import configuration'
>>> >>>>
>>> >>>> "Initialize database...
>>> >>>> Unable to load the config, contains a configuration error.
>>> >>>> Traceback (most recent call last):
>>> >>>> File "/usr/lib/python3.5/logging/config.py", line 384, in resolve
>>> >>>>   self.importer(used)
>>> >>>> ImportError: No module named 'airflow.utils.log.logging_
>>> >> mixin.RedirectStdHandler';
>>> >>>> 'airflow.utils.log.logging_mixin' is not a package
>>> >>>>
>>> >>>> The above exception was the direct cause of the following exception:
>>> >>>>
>>> >>>> Traceback (most recent call last):
>>> >>>> File "/usr/lib/python3.5/logging/config.py", line 558, in configure
>>> >>>>   handler = self.configure_handler(handlers[name])
>>> >>>> File "/usr/lib/python3.5/logging/config.py", line 708, in
>>> >>>> configure_handler
>>> >>>>   klass = self.resolve(cname)
>>> >>>> File "/usr/lib/python3.5/logging/config.py", line 391, in resolve
>>> >>>>   raise v
>>> >>>> File "/usr/lib/python3.5/logging/config.py", line 384, in resolve
>>> >>>>   self.importer(used)
>>> >>>> ValueError: Cannot resolve 'airflow.utils.log.logging_
>>> >> mixin.RedirectStdHandler':
>>> >>>> No module named 'airflow.utils.log.logging_mix
>>> in.RedirectStdHandler';
>>> >>>> 'airflow.utils.log.logging_mixin' is not a package
>>> >>>>
>>> >>>> During handling of the above exception, another exception occurred:
>>> >>>>
>>> >>>> Traceback (most recent call last):
>>> >>>> File "/usr/local/bin/airflow", line 16, in <module>
>>> >>>>   from airflow import configuration
>>> >>>> File "/usr/local/lib/python3.5/dist-packages/airflow/__init__.py",
>>> >> line
>>> >>>> 31, in <module>
>>> >>>>   from airflow import settings
>>> >>>> File "/usr/local/lib/python3.5/dist-packages/airflow/settings.py",
>>> >> line
>>> >>>> 148, in <module>
>>> >>>>   configure_logging()
>>> >>>> File "/usr/local/lib/python3.5/dist-packages/airflow/logging_
>>> >> config.py",
>>> >>>> line 75, in configure_logging
>>> >>>>   raise e
>>> >>>> File "/usr/local/lib/python3.5/dist-packages/airflow/logging_
>>> >> config.py",
>>> >>>> line 70, in configure_logging
>>> >>>>   dictConfig(logging_config)
>>> >>>> File "/usr/lib/python3.5/logging/config.py", line 795, in
>>> dictConfig
>>> >>>>   dictConfigClass(config).configure()
>>> >>>> File "/usr/lib/python3.5/logging/config.py", line 566, in configure
>>> >>>>   '%r: %s' % (name, e))
>>> >>>> ValueError: Unable to configure handler 'console': Cannot resolve
>>> >>>> 'airflow.utils.log.logging_mixin.RedirectStdHandler': No module
>>> named
>>> >>>> 'airflow.utils.log.logging_mixin.RedirectStdHandler';
>>> >>>> 'airflow.utils.log.logging_mixin' is not a package
>>> >>>> HTTP/1.1 200 OK
>>> >>>> Unable to load the config, contains a configuration error.
>>> >>>> Traceback (most recent call last):
>>> >>>> File "/usr/lib/python3.5/logging/config.py", line 384, in resolve
>>> >>>>   self.importer(used)
>>> >>>> ImportError: No module named 'airflow.utils.log.logging_
>>> >> mixin.RedirectStdHandler';
>>> >>>> 'airflow.utils.log.logging_mixin' is not a package
>>> >>>>
>>> >>>> The above exception was the direct cause of the following exception:
>>> >>>>
>>> >>>> Traceback (most recent call last):
>>> >>>> File "/usr/lib/python3.5/logging/config.py", line 558, in configure
>>> >>>>   handler = self.configure_handler(handlers[name])
>>> >>>> File "/usr/lib/python3.5/logging/config.py", line 708, in
>>> >>>> configure_handler
>>> >>>>   klass = self.resolve(cname)
>>> >>>> File "/usr/lib/python3.5/logging/config.py", line 391, in resolve
>>> >>>>   raise v
>>> >>>> File "/usr/lib/python3.5/logging/config.py", line 384, in resolve
>>> >>>>   self.importer(used)
>>> >>>> ValueError: Cannot resolve 'airflow.utils.log.logging_
>>> >> mixin.RedirectStdHandler':
>>> >>>> No module named 'airflow.utils.log.logging_mix
>>> in.RedirectStdHandler';
>>> >>>> 'airflow.utils.log.logging_mixin' is not a package
>>> >>>>
>>> >>>> During handling of the above exception, another exception occurred:
>>> >>>>
>>> >>>> Traceback (most recent call last):
>>> >>>> File "/usr/local/bin/airflow", line 16, in <module>
>>> >>>>   from airflow import configuration
>>> >>>> File "/usr/local/lib/python3.5/dist-packages/airflow/__init__.py",
>>> >> line
>>> >>>> 31, in <module>
>>> >>>>   from airflow import settings
>>> >>>> File "/usr/local/lib/python3.5/dist-packages/airflow/settings.py",
>>> >> line
>>> >>>> 148, in <module>
>>> >>>>   configure_logging()
>>> >>>> File "/usr/local/lib/python3.5/dist-packages/airflow/logging_
>>> >> config.py",
>>> >>>> line 75, in configure_logging
>>> >>>>   raise e
>>> >>>> File "/usr/local/lib/python3.5/dist-packages/airflow/logging_
>>> >> config.py",
>>> >>>> line 70, in configure_logging
>>> >>>>   dictConfig(logging_config)
>>> >>>> File "/usr/lib/python3.5/logging/config.py", line 795, in
>>> dictConfig
>>> >>>>   dictConfigClass(config).configure()
>>> >>>> File "/usr/lib/python3.5/logging/config.py", line 566, in configure
>>> >>>>   '%r: %s' % (name, e))
>>> >>>> ValueError: Unable to configure handler 'console': Cannot resolve
>>> >>>> 'airflow.utils.log.logging_mixin.RedirectStdHandler': No module
>>> named
>>> >>>> 'airflow.utils.log.logging_mixin.RedirectStdHandler';
>>> >>>> 'airflow.utils.log.logging_mixin' is not a package"
>>> >>>>
>>> >>>> Have you encountered this before?
>>> >>>>
>>> >>>> On Mon, Dec 18, 2017 at 8:53 PM, Feng Lu <[email protected]
>>> >
>>> >>>> wrote:
>>> >>>>
>>> >>>>> Hi Kevin,
>>> >>>>>
>>> >>>>> Kindly see my reply inline:
>>> >>>>>
>>> >>>>>> On Mon, Dec 18, 2017 at 3:28 PM, Kevin Lam <[email protected]
>>> >
>>> >> wrote:
>>> >>>>>>
>>> >>>>>> Hi,
>>> >>>>>>
>>> >>>>>> I'm trying to get airflow to use GCS for logging purposes and had
>>> a
>>> >> few
>>> >>>>>> questions.
>>> >>>>>>
>>> >>>>>> We're currently using Airflow 1.9rc2, running in a Kubernetes
>>> Airflow
>>> >>>>>> deployment (similar to https://github.com/mumoshu/kube-airflow)
>>> >>>>>>
>>> >>>>>> 1/ Seems like the logging code has been going through some
>>> changes in
>>> >>>>> the
>>> >>>>>> recent versions. What's the correct way to set up GCS for
>>> logging? Is
>>> >>>>> it by
>>> >>>>>> just specifying remote_base_log_folder and remote_log_conn_id in
>>> >>>>>> airflow.cfg? Or by following this guide:
>>> >>>>>> http://airflow.readthedocs.io/en/latest/integration.html#gcp,
>>> using
>>> >> the
>>> >>>>>> python based logging config? Is there an Airflow version that we
>>> >> should
>>> >>>>> use
>>> >>>>>> to be most stable?
>>> >>>>>>
>>> >>>>> The python based logging config is the right place to make
>>> changes, in
>>> >> our
>>> >>>>> test setup, we override the airflow_local_settings.py similarly to
>>> the
>>> >>>>> link
>>> >>>>> you pasted.
>>> >>>>> You may also want to config: [core]task_log_reader = gcs.task
>>> >>>>>
>>> >>>>>
>>> >>>>>>
>>> >>>>>> 2/ Is there a way to encode the connection for GCS in a file so
>>> that
>>> >> one
>>> >>>>>> doesn't have to open the webserver and create it from the admin
>>> panel?
>>> >>>>> It'd
>>> >>>>>> be nice if the GCS connection would be automatically created.
>>> >>>>>>
>>> >>>>> Unfortunately GCS connection ties to some GCP project and is
>>> >> impossible to
>>> >>>>> pre-populate.
>>> >>>>> Airflow1.9 should fix the gcp connection type issue  (
>>> >>>>> https://github.com/apache/incubator-airflow/commit/2f107d8a3
>>> >>>>> 0910fd025774004d5c4c95407ed55c5),
>>> >>>>> so you can use airflow connections CLI directly.
>>> >>>>>
>>> >>>>>
>>> >>>>>>
>>> >>>>>> Thanks in advance for your help!
>>> >>>>>>
>>> >>>>>
>>> >>>>
>>> >>>>
>>> >>
>>>
>>>
>>
>

Reply via email to