MingTaLee commented on issue #17320:
URL: https://github.com/apache/airflow/issues/17320#issuecomment-1567712227
@potiuk
Thank yo very much for your reply, really appreciated!
Our purpose is to set up a development and testing enviromnemt for my
colleagues to test their dags. And we would like to have the dags folder bind
to another directory from the server side using docker volume settings. My
colleagues will use the account "airflow" to SSH login into the worker
container and modify dags, git push to repository and test it.
However, since the volume bind into the container now owned by the user
named "default", user "airflow" does not have permission to do the works
(modification and git). And if I modify the owner or permission inside the
container, I will mess up those in the server side....
Below is what I did / not did:
1. I didn't use docker swarm to manage the service. Simply `docker-compose
up airflow-init` and then `docker-compose up` to start.. So unfortunately I
don't have stack trace to provide here (or do you mean there is something else
called stack track? If so, please help me with how I can find it!).
2. Another important issue I forgot mentioned earlier is, in my first few
tests to start the service with docker-compose up (with version 2.5.3 docker
image from Airflow directly, no modification at this stage), the tests failed.
After investigation, I found that the "webserver_config.py" file is not created
properly. Instead of a file, there is an empty folder with that name created.
And no airflow.cfg file to be found. I have to manually provide airflow.cfg
and webserver_config.py I copied from my previous test (back when testing
Airflow 1.10.12). **NOT SURE WHERE SOMETHING MESSED UP IN THIS STEP....** I
simply assumed that since the service started, those settings should be OK....
3. Later I did modify the image a bit to add SSH and PyMySQL into the image.
Here is the modified dockerfile:
`FROM apache/airflow:2.5.3-python3.8
LABEL description="Modify from Airflow 2.5.3 image by Apache. Add
openssh-server / PyMySQL. New_name: airflow253:v2.01" version="2.01"
RUN export DEBIAN_FRONTEND=noninteractive \
&& python3 -m pip install --no-cache-dir --upgrade pip && python3 -m pip
install --upgrade setuptools \
&& python3 -m pip install --no-cache-dir pymysql
USER root
RUN export DEBIAN_FRONTEND=noninteractive \
&& apt-get update && apt-get -y upgrade \
&& apt-get install -y openssh-server git \
&& apt-get purge && apt-get clean && apt-get autoclean && apt-get remove
&& apt-get -y autoremove \
&& rm -Rf /root/.cache/pip \
&& rm -rf /var/lib/apt/lists/*
CMD ["/bin/bash"]`
Although I don't think these modifications will change the variables you
mentioned, but I may be wrong.
Below are the output of those commands you mentioned:
`default@51d57fa9c7f2:/opt/airflow$ cat /etc/passwd | grep default
default:x:1002:0:default user:/home/airflow:/sbin/nologin`
`default@51d57fa9c7f2:/opt/airflow$ set |grep AIRFLOW_HOME
AIRFLOW_HOME=/opt/airflow
default@51d57fa9c7f2:/opt/airflow$ ls ${AIRFLOW_HOME}
airflow-worker.pid airflow.cfg dags logs plugins webserver_config.py`
`default@51d57fa9c7f2:/opt/airflow$ airflow config list
[core]
dags_folder = /opt/airflow/dags
hostname_callable = airflow.utils.net.getfqdn
default_timezone = utc
executor = CeleryExecutor
parallelism = 32
max_active_tasks_per_dag = 16
dags_are_paused_at_creation = True
max_active_runs_per_dag = 16
load_examples = True
plugins_folder = /opt/airflow/plugins
execute_tasks_new_python_interpreter = False
fernet_key =
donot_pickle = True
dagbag_import_timeout = 30.0
dagbag_import_error_tracebacks = True
dagbag_import_error_traceback_depth = 2
dag_file_processor_timeout = 50
task_runner = StandardTaskRunner
default_impersonation =
security =
unit_test_mode = False
enable_xcom_pickling = False
allowed_deserialization_classes = airflow\..*
killed_task_cleanup_time = 60
dag_run_conf_overrides_params = True
dag_discovery_safe_mode = True
dag_ignore_file_syntax = regexp
default_task_retries = 0
default_task_retry_delay = 300
default_task_weight_rule = downstream
default_task_execution_timeout =
min_serialized_dag_update_interval = 30
compress_serialized_dags = False
min_serialized_dag_fetch_interval = 10
max_num_rendered_ti_fields_per_task = 30
check_slas = True
xcom_backend = airflow.models.xcom.BaseXCom
lazy_load_plugins = True
lazy_discover_providers = True
hide_sensitive_var_conn_fields = True
sensitive_var_conn_names =
default_pool_task_slot_count = 128
max_map_length = 1024
daemon_umask = 0o077
sql_alchemy_conn = postgresql+psycopg2://airflow:airflow@postgres/airflow
[database]
sql_alchemy_conn = postgresql+psycopg2://airflow:airflow@postgres/airflow
sql_engine_encoding = utf-8
sql_alchemy_pool_enabled = True
sql_alchemy_pool_size = 5
sql_alchemy_max_overflow = 10
sql_alchemy_pool_recycle = 1800
sql_alchemy_pool_pre_ping = True
sql_alchemy_schema =
load_default_connections = True
max_db_retries = 3
[logging]
base_log_folder = /opt/airflow/logs
remote_logging = False
remote_log_conn_id =
google_key_path =
remote_base_log_folder =
encrypt_s3_logs = False
logging_level = INFO
celery_logging_level =
fab_logging_level = WARNING
logging_config_class =
colored_console_log = True
colored_log_format = [%(blue)s%(asctime)s%(reset)s]
{%(blue)s%(filename)s:%(reset)s%(lineno)d} %(log_color)s%(levelname)s%(reset)s
- %(log_color)s%(message)s%(reset)s
colored_formatter_class =
airflow.utils.log.colored_log.CustomTTYColoredFormatter
log_format = [%(asctime)s] {%(filename)s:%(lineno)d} %(levelname)s -
%(message)s
simple_log_format = %(asctime)s %(levelname)s - %(message)s
dag_processor_log_target = file
dag_processor_log_format = [%(asctime)s] [SOURCE:DAG_PROCESSOR]
{%(filename)s:%(lineno)d} %(levelname)s - %(message)s
log_formatter_class = airflow.utils.log.timezone_aware.TimezoneAware
task_log_prefix_template =
log_filename_template = dag_id={{ ti.dag_id }}/run_id={{ ti.run_id
}}/task_id={{ ti.task_id }}/{% if ti.map_index >= 0 %}map_index={{ ti.map_index
}}/{% endif %}attempt={{ try_number }}.log
log_processor_filename_template = {{ filename }}.log
dag_processor_manager_log_location =
/opt/airflow/logs/dag_processor_manager/dag_processor_manager.log
task_log_reader = task
extra_logger_names =
worker_log_server_port = 8793
[metrics]
statsd_on = False
statsd_host = localhost
statsd_port = 8125
statsd_prefix = airflow
statsd_allow_list =
stat_name_handler =
statsd_datadog_enabled = False
statsd_datadog_tags =
[secrets]
backend =
backend_kwargs =
[cli]
api_client = airflow.api.client.local_client
endpoint_url = http://localhost:8080
[debug]
fail_fast = False
[api]
enable_experimental_api = False
auth_backends =
airflow.api.auth.backend.basic_auth,airflow.api.auth.backend.session
maximum_page_limit = 100
fallback_page_limit = 100
google_oauth2_audience =
google_key_path =
access_control_allow_headers =
access_control_allow_methods =
access_control_allow_origins =
[lineage]
backend =
[atlas]
sasl_enabled = False
host =
port = 21000
username =
password =
[operators]
default_owner = airflow
default_cpus = 1
default_ram = 512
default_disk = 512
default_gpus = 0
default_queue = default
allow_illegal_arguments = False
[hive]
default_hive_mapred_queue =
[webserver]
base_url = http://localhost:8080
default_ui_timezone = UTC
web_server_host = 0.0.0.0
web_server_port = 8080
web_server_ssl_cert =
web_server_ssl_key =
session_backend = database
web_server_master_timeout = 120
web_server_worker_timeout = 120
worker_refresh_batch_size = 1
worker_refresh_interval = 6000
reload_on_plugin_change = False
secret_key = BSlegi2JIGb8pADrl2RNYw==
workers = 4
worker_class = sync
access_logfile = -
error_logfile = -
access_logformat =
expose_config = False
expose_hostname = False
expose_stacktrace = False
dag_default_view = grid
dag_orientation = LR
log_fetch_timeout_sec = 5
log_fetch_delay_sec = 2
log_auto_tailing_offset = 30
log_animation_speed = 1000
hide_paused_dags_by_default = False
page_size = 100
navbar_color = #fff
default_dag_run_display_number = 25
enable_proxy_fix = False
proxy_fix_x_for = 1
proxy_fix_x_proto = 1
proxy_fix_x_host = 1
proxy_fix_x_port = 1
proxy_fix_x_prefix = 1
cookie_secure = False
cookie_samesite = Lax
default_wrap = False
x_frame_enabled = True
show_recent_stats_for_completed_runs = True
update_fab_perms = True
session_lifetime_minutes = 43200
instance_name_has_markup = False
auto_refresh_interval = 3
warn_deployment_exposure = True
audit_view_excluded_events =
gantt,landing_times,tries,duration,calendar,graph,grid,tree,tree_data
[email]
email_backend = airflow.utils.email.send_email_smtp
email_conn_id = smtp_default
default_email_on_retry = True
default_email_on_failure = True
[smtp]
smtp_host = localhost
smtp_starttls = True
smtp_ssl = False
smtp_port = 25
smtp_mail_from = [email protected]
smtp_timeout = 30
smtp_retry_limit = 5
[sentry]
sentry_on = False
sentry_dsn =
[local_kubernetes_executor]
kubernetes_queue = kubernetes
[celery_kubernetes_executor]
kubernetes_queue = kubernetes
[celery]
celery_app_name = airflow.executors.celery_executor
worker_concurrency = 16
worker_prefetch_multiplier = 1
worker_enable_remote_control = True
broker_url = redis://:@redis:6379/0
flower_host = 0.0.0.0
flower_url_prefix =
flower_port = 5555
flower_basic_auth =
sync_parallelism = 0
celery_config_options =
airflow.config_templates.default_celery.DEFAULT_CELERY_CONFIG
ssl_active = False
ssl_key =
ssl_cert =
ssl_cacert =
pool = prefork
operation_timeout = 1.0
task_track_started = True
task_adoption_timeout = 600
stalled_task_timeout = 0
task_publish_max_retries = 3
worker_precheck = False
result_backend = db+postgresql://airflow:airflow@postgres/airflow
[celery_broker_transport_options]
[dask]
cluster_address = 127.0.0.1:8786
tls_ca =
tls_cert =
tls_key =
[scheduler]
job_heartbeat_sec = 5
scheduler_heartbeat_sec = 5
num_runs = -1
scheduler_idle_sleep_time = 1
min_file_process_interval = 30
parsing_cleanup_interval = 60
dag_dir_list_interval = 300
print_stats_interval = 30
pool_metrics_interval = 5.0
scheduler_health_check_threshold = 30
enable_health_check = True
scheduler_health_check_server_port = 8974
orphaned_tasks_check_interval = 300.0
child_process_log_directory = /opt/airflow/logs/scheduler
scheduler_zombie_task_threshold = 300
zombie_detection_interval = 10.0
catchup_by_default = True
ignore_first_depends_on_past_by_default = True
max_tis_per_query = 512
use_row_level_locking = True
max_dagruns_to_create_per_loop = 10
max_dagruns_per_loop_to_schedule = 20
schedule_after_task_execution = True
parsing_processes = 2
file_parsing_sort_mode = modified_time
standalone_dag_processor = False
max_callbacks_per_loop = 20
dag_stale_not_seen_duration = 600
use_job_schedule = True
allow_trigger_in_future = False
trigger_timeout_check_interval = 15
[triggerer]
default_capacity = 1000
[kerberos]
ccache = /tmp/airflow_krb5_ccache
principal = airflow
reinit_frequency = 3600
kinit_path = kinit
keytab = airflow.keytab
forwardable = True
include_ip = True
[elasticsearch]
host =
log_id_template = {dag_id}-{task_id}-{run_id}-{map_index}-{try_number}
end_of_log_mark = end_of_log
frontend =
write_stdout = False
json_format = False
json_fields = asctime, filename, lineno, levelname, message
host_field = host
offset_field = offset
[elasticsearch_configs]
use_ssl = False
verify_certs = True
[kubernetes_executor]
pod_template_file =
worker_container_repository =
worker_container_tag =
namespace = default
delete_worker_pods = True
delete_worker_pods_on_failure = False
worker_pods_creation_batch_size = 1
multi_namespace_mode = False
in_cluster = True
kube_client_request_args =
delete_option_kwargs =
enable_tcp_keepalive = True
tcp_keep_idle = 120
tcp_keep_intvl = 30
tcp_keep_cnt = 6
verify_ssl = True
worker_pods_pending_timeout = 300
worker_pods_pending_timeout_check_interval = 120
worker_pods_queued_check_interval = 60
worker_pods_pending_timeout_batch_size = 100
[sensors]
default_timeout = 604800`
Thanks for youe valuable help and please let know if there is any
information you need.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]