potiuk commented on a change in pull request #19189:
URL: https://github.com/apache/airflow/pull/19189#discussion_r737947221



##########
File path: Dockerfile
##########
@@ -441,25 +443,53 @@ RUN chmod a+x /scripts/docker/install_mysql.sh && \
     find "${AIRFLOW_HOME}" -executable -print0 | xargs --null chmod g+x && \
         find "${AIRFLOW_HOME}" -print0 | xargs --null chmod g+rw
 
-COPY --chown=airflow:root --from=airflow-build-image /root/.local 
"${AIRFLOW_USER_HOME_DIR}/.local"
+COPY --chown=airflow:root --from=airflow-build-image /.venv /.venv
 COPY --chown=airflow:root scripts/in_container/prod/entrypoint_prod.sh 
/entrypoint
 COPY --chown=airflow:root scripts/in_container/prod/clean-logs.sh /clean-logs
 
 # Make /etc/passwd root-group-writeable so that user can be dynamically added 
by OpenShift
 # See https://github.com/apache/airflow/issues/9248
 
 RUN chmod a+x /entrypoint /clean-logs && \
-    chmod g=u /etc/passwd && \
-    bash /scripts/docker/install_pip_version.sh
+    chmod g=u /etc/passwd
+
+# Set default groups for airflow and root user
+RUN usermod -g 0 airflow -G 0
+
+# make sure that the venv is activated for all users
+# interactive, non-interactive shells and plain sudo, sudo with --interactive 
flag
+RUN echo ". /.venv/bin/activate" >> "${AIRFLOW_USER_HOME_DIR}/.bashrc" && \
+    echo ". /.venv/bin/activate" >> /root/.bashrc && \
+    sed --in-place=.bak "s/secure_path=\"/secure_path=\"\/.venv\/bin:/" 
/etc/sudoers

Review comment:
       I believe it's not enough.
   
   See the comment in this dicussion 
https://github.com/pypa/pip/issues/10556#issuecomment-950304351 
   
   @mik-laj is trying to convince them to switch their way of creating 
executable `dbt` script 
https://github.com/dbt-labs/dbt-core/issues/4035#issuecomment-941674931. 
   
   This might actually "work" in `dbt` case, if the python interpreter is 
actually run from the PATH by the `dbt` binary script, but I am not sure if 
this is always the case for all the tools we might have instaled there.
   
   Kamil explained it very well in the issue - this is something we also used 
to have in Airflow 1.10. The side effect of this approach (very popular among 
package creators) is that it requires full activation of hte virtual 
environment. If you want to execute the binary scrupt just via absolute script 
without activating the environment, it will fail. Unfortunately the explanation 
from https://docs.python.org/3/library/venv.html is that you do not have to 
activate virtualenv for **python interpreter** run directly from the path, but 
it does not mean that any other package installed there can also be run without 
activation. 
   
   I am not sure if there are other cases - I believe sometimes there might or 
might not work, depending how the "binary scripts" are written - they are 
usually only tested with "activated" virtualenv, so there can be some very 
nasty surprises.
   
   We have no control on how 'proper' implementation of binary scripts 
implements - and many of those tools are alreay used in providers or simply 
often used with airflow..
   
   That's why I think just adding `bin` to the path is not enough. And that's 
one of the main reaons why "switch to virtualenv' is not enoug and why I 
believe (as I stated in 
https://discuss.python.org/t/graceful-cooperation-between-external-and-python-package-managers-pep-668/10302/18?u=potiuk
 - still waiting for comments) PEP 668 should make a bit more precise 
recommendations and guidance to image creators - because while those steps are 
all necessary to make use of venv in the image, none of those things are 
mentioned nor even obvious (and a lot of people does not realise of those bad 
consequences)
   
   Unless I am totally mistaken and we can always trust that we can run any 
binary script in `.venv/bin` without venv activation - but I don't believe so.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to