potiuk commented on a change in pull request #4936: [AIRFLOW-4115] Multi-staging Aiflow Docker image [Step 1/3] URL: https://github.com/apache/airflow/pull/4936#discussion_r268662206
########## File path: .dockerignore ########## @@ -0,0 +1,107 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +# NOTE! This docker ignore uses recommended technique +# Where everything is excluded by default and you deliberately +# Add only those directories/files you need. This is very useful +# To make sure that Docker context is always the same on any machine +# So that generated files are not accidentally added to the context +# This allows Docker's `COPY .` to behave in predictable way + +# Ignore everything +** Review comment: I have already given a talk on that actually I plan to write blog post :). It's a practice I now absolutely recommend. I read it in many places as recommendation (for example here: https://youknowfordevs.com/2018/12/07/getting-control-of-your-dockerignore-files.html). But I experienced the effect of it in airflow. Besides longer time to build, this is a problem of cache invalidation when you do 'COPY .' . The thing is that in this case the whole context has to match when you run rebuild of the image and if you add everything by default then every single file generated or added by accident in your source dir will cause context invalidation. We already generate a lot of stuff in the sources (node_modules, 'static' and a number of others). When you do local development you often run stuff (such as document generation) locally and they introduce a lot of garbage in your context and invalidate the cache (thus pretty much every single time you restart dockerfile from the COPY . step no matter if the actual sources changed or not. Moreover - you are not able to foresee what other stuff will be generated in the sources in the future. So no matter how hard you try and exclude everything you do not want, somebody few months from now might add a single generated file (or likely directory) that will cause subsequent frequent cache invalidations. Therefore "exclude everything" and then "add what you need" works much better. There is no chance you forget to add something important then - because your tests will fail. This is actually the reason I now deliberately build Docker image and run tests from the binary image rather than using sources mounted (as it was before). This way your full docker image is tested. Previously some files could be ignored on build but then mounted from sources - which is quite bad practice - because it does not test the Docker image but some hybrid of Docker image and mounted sources (which are not necessarily the same). One - hugely important additional thing - making context as minimal as possible is a huge time saver for local builds. What happens under the hood when Docker build runs, the whole context is compressed/packed and sent to the engine rather than used from local sources. This is the reason why Docker build sometimes pauses for many seconds until build starts - if you see it, it usually means you have not done good job in excluding some generated binaries. If you don't ignore such generated files (especially node_modules which are huge/lots of files) then your context might grow very large (and uncontrollably). Excluding ** is all but guarantee that the context will not grow accidentally and uncontrollably. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
