On Tue, Apr 28, 2020 at 5:03 PM Robert Bradshaw <[email protected]> wrote:
> On Tue, Apr 28, 2020 at 4:46 PM Hannah Jiang <[email protected]> > wrote: > >> I guess I assumed there was some reason we needed "lightweight images" in >>> our tests (because licenses take up a lot of space IIRC), but maybe not. >>> Can you elaborate on the purpose of this option Hannah? >> >> Reducing image size is a good reason to have the pull option. There were >> user requests to create lightweight images. Now I think users can use skip >> option to create lightweight iamges. pull option is not needed. >> >> Then the discussion becomes which one should be the default mode? >> According to feedback, skip should be the default mode, and change it to >> add mode when running Jenkins test. And the docker-pull-licenses tag is >> binary again. >> It seems like there is an easy way to pass the tag to all Jenkins test, >> which is adding *context.switches("-Pdocker-pull-licenses"**)* to >> CommonJobProperties.groovy >> <https://github.com/apache/beam/blob/master/.test-infra/jenkins/CommonJobProperties.groovy#L150>. >> I'm not very familiar with Jenkins, does this would work as expected? >> If it sounds like a bad idea to pass this tag to ALL tests, I would ask >> help from community to identify those tests which create docker images >> (Java and Python for now) and pass the tag to the tests. >> > > That could work. > > The other question is, how easy would it be for a release manager to > accidentally push non-compliant images? > Echoing my previous question: would it make sense to add licenses whenever we build containers on the release branch, which can be checked by [1]? [1] https://github.com/apache/beam/blob/74a6565c8b64d9fadf256370df47a4c5dadafb55/buildSrc/src/main/groovy/org/apache/beam/gradle/BeamModulePlugin.groovy#L296 > > We have definitely not explored how cheap we can make doing the "real > thing" can be, which would IMHO be best. > > >> >> >> On Tue, Apr 28, 2020 at 3:50 PM Kyle Weaver <[email protected]> wrote: >> >>> > To overwrite, we need to pass the tag for ALL Jenkins tests that >>> create docker images, which is quite a bunch of work. >>> >>> If it comes to that, I'd rather we do the work of passing tags instead >>> of users. >>> >>> > Does anyone know if there is an easy way to pass the tag to many >>> tests? or overwrite the default mode, which will be defined at >>> gradle.properties, to pull ONLY with Jenkins tests? >>> >>> There is precedence for the latter option: >>> https://github.com/apache/beam/blob/1905dbde5ca0858fce89f65c761e88511840a384/build.gradle#L45 >>> >>> > (On that note, however, pull seems bad for both.) >>> >>> I guess I assumed there was some reason we needed "lightweight images" >>> in our tests (because licenses take up a lot of space IIRC), but maybe not. >>> Can you elaborate on the purpose of this option Hannah? >>> >>> On Tue, Apr 28, 2020 at 6:40 PM Robert Bradshaw <[email protected]> >>> wrote: >>> >>>> Fundamentally having license checking off by default is dangerous for >>>> releases, but having it on by default is annoying for developers. (On that >>>> note, however, pull seems bad for both.) Is it possible to make it a gradle >>>> target that only runs when something (specifically dependencies, or the >>>> files declaring them) have changed, and would leverage the gradle cache >>>> (and hence be cheap) otherwise? >>>> >>>> Alternatively, I think we should invest in URL caching. These urls >>>> should be immutable; let's only download them once, ever. >>>> >>>> On Tue, Apr 28, 2020 at 3:31 PM Hannah Jiang <[email protected]> >>>> wrote: >>>> >>>>> Do you mean set the default mode to skip and overwrite it to pull mode >>>>> with Jenkins test? To overwrite, we need to pass the tag for ALL Jenkins >>>>> tests that create docker images, which is quite a bunch of work. >>>>> Does anyone know if there is an easy way to pass the tag to many >>>>> tests? or overwrite the default mode, which will be defined at >>>>> gradle.properties, to pull ONLY with Jenkins tests? We can add a step to >>>>> do >>>>> this after cloning a git branch to Jenkins machine. But it seems easier to >>>>> change the local gradle.properties for local development purpose..? >>>>> >>>>> >>>>> >>>>> On Tue, Apr 28, 2020 at 3:09 PM Thomas Weise <[email protected]> wrote: >>>>> >>>>>> Can this be solved by enabling the license magic for the Jenkins jobs? >>>>>> >>>>>> I also think the default should be off for better local development >>>>>> experience. >>>>>> >>>>>> On Tue, Apr 28, 2020 at 3:05 PM Hannah Jiang <[email protected]> >>>>>> wrote: >>>>>> >>>>>>> We need to make sure the release images contains licenses/notices in >>>>>>> order to avoid potential legal issues. The purpose of setting the >>>>>>> default >>>>>>> to pull is checking PRs which introduce new dependencies include thier >>>>>>> licenses as well, in a way of auto pulling or tool pulling, to make sure >>>>>>> all licenses are included when we create release images. If setting >>>>>>> default >>>>>>> to skip, we only check licenses when we create release images and may >>>>>>> see >>>>>>> many issues with the licenses, and the release would be delayed, maybe a >>>>>>> lot. >>>>>>> >>>>>>> >>>>>>> On Tue, Apr 28, 2020 at 2:50 PM Kyle Weaver <[email protected]> >>>>>>> wrote: >>>>>>> >>>>>>>> The different modes make sense. One disagreement: I think the >>>>>>>> default should be skip. I imagine few users would want to put licenses >>>>>>>> in >>>>>>>> their own images. >>>>>>>> >>>>>>>> On Tue, Apr 28, 2020 at 5:36 PM Hannah Jiang < >>>>>>>> [email protected]> wrote: >>>>>>>> >>>>>>>>> The above 1,2,3,4 are merged to master. >>>>>>>>> Here I would like to change behaviors with *docker-pull-licenses* >>>>>>>>> tag and would like to collect feedbacks from community before >>>>>>>>> finalizing my >>>>>>>>> PR. >>>>>>>>> >>>>>>>>> docker-pull-licenses can be set to one of ['add', 'pull', 'skip']. >>>>>>>>> >>>>>>>>> - *docker-pull-licenses=add* pulls licenses/notices and the >>>>>>>>> files are added to docker images. This tag is used to create >>>>>>>>> release docker >>>>>>>>> images. >>>>>>>>> - *docker-pull-licenses=pull* pulls licenses/notices but do >>>>>>>>> not add the files to docker images so that create lightweight >>>>>>>>> images. This >>>>>>>>> is the default mode, and docker images created by Jenkins test use >>>>>>>>> this >>>>>>>>> mode. This will make sure the files are all pullable and detect >>>>>>>>> issues with >>>>>>>>> the tool, except the ADD part at DockerFile, which is rarely >>>>>>>>> likely to be >>>>>>>>> an issue. It is better than the checking URLs approach mentioned >>>>>>>>> above, >>>>>>>>> because it is a more similar process like creating release images >>>>>>>>> and code >>>>>>>>> can be simplified so that make it easier to maintain. >>>>>>>>> - *docker-pull-licenses=skip* skips all license pulling >>>>>>>>> related tasks. This tag is provided for users who customize their >>>>>>>>> images >>>>>>>>> with DockerFile provided by us and would not like to deal with >>>>>>>>> license >>>>>>>>> stuff. Without this tag, the users should change the .gradle file >>>>>>>>> to get >>>>>>>>> rid of it. It also can be used for local development when >>>>>>>>> developers want >>>>>>>>> to solve license related issues later. >>>>>>>>> >>>>>>>>> I would like to see this tag is adopted by all docker images >>>>>>>>> released by Beam, including Flink and Spark job server images, and >>>>>>>>> share >>>>>>>>> the same default mode. >>>>>>>>> Does this sound good? Are there any concerns? >>>>>>>>> >>>>>>>>> Please let me know what you think. >>>>>>>>> Hannah >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On Thu, Apr 16, 2020 at 11:46 AM Hannah Jiang < >>>>>>>>> [email protected]> wrote: >>>>>>>>> >>>>>>>>>> Ok, so the fianl approach is >>>>>>>>>> 1. Using multi threading with 16 threads. >>>>>>>>>> 2. Introduce docker-pull-license tag, by default it is disabled. >>>>>>>>>> 3. By default, only check if urls return 200, not actually pull >>>>>>>>>> the files to reduce image size. Licenses add 85MB of image size as >>>>>>>>>> of today. >>>>>>>>>> 4. Add retries to avoid flakiness. Initially it was added to >>>>>>>>>> download part only, assuming urlib is stable when validates urls, >>>>>>>>>> but it is >>>>>>>>>> not true. >>>>>>>>>> >>>>>>>>>> For caching, it is definitely useful. It will mitigate flakiness >>>>>>>>>> further, like when github is down (many urls point to github). It >>>>>>>>>> also >>>>>>>>>> reduces unnecessary internet traffic. >>>>>>>>>> Let's reconsider it after the current work is merged to 2.21.0. >>>>>>>>>> >>>>>>>>>> Thank you all for providing feedback and ideas. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Thu, Apr 16, 2020 at 10:24 AM Kyle Weaver <[email protected]> >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> > I tried with multi processing and it improved the performance >>>>>>>>>>> a lot. >>>>>>>>>>> >>>>>>>>>>> Great! Though it won't help with flakes, so as you said we >>>>>>>>>>> should still look into caching as well. >>>>>>>>>>> >>>>>>>>>>> > You could try using a threadpool >>>>>>>>>>> >>>>>>>>>>> +1 >>>>>>>>>>> >>>>>>>>>>> > However, we would like to include this work as part of 2.21.0 >>>>>>>>>>> >>>>>>>>>>> I had marked the jira as a blocker for 2.21.0 because I was >>>>>>>>>>> afraid something was broken, but now it looks like the failures >>>>>>>>>>> were just >>>>>>>>>>> flakes. So BEAM-9764 >>>>>>>>>>> <https://issues.apache.org/jira/browse/BEAM-9764> should not be >>>>>>>>>>> a release blocker. >>>>>>>>>>> >>>>>>>>>>> On Thu, Apr 16, 2020 at 1:00 PM Thomas Weise <[email protected]> >>>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>>> Hi Hannah, >>>>>>>>>>>> >>>>>>>>>>>> Thanks for investigating! >>>>>>>>>>>> >>>>>>>>>>>> I think it would be great to eliminate the overhead for local >>>>>>>>>>>> builds (by default turn off the license assembly) and make it as >>>>>>>>>>>> lightweight as possible in the frequent CI path. >>>>>>>>>>>> >>>>>>>>>>>> Thomas >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On Thu, Apr 16, 2020 at 1:37 AM Hannah Jiang < >>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> I tried to check if urls are valid instead of pulling the >>>>>>>>>>>>> files and it reduced only 1 min of running time. So, it's not an >>>>>>>>>>>>> option >>>>>>>>>>>>> here. >>>>>>>>>>>>> >>>>>>>>>>>>> I tried with multi processing and it improved the performance >>>>>>>>>>>>> a lot. >>>>>>>>>>>>> With 12 subprocesses, the running time reduced to 49 seconds, >>>>>>>>>>>>> and with 16 cores, it reduced to 18 seconds. >>>>>>>>>>>>> The number of subprocesses is defined by the number of cores, >>>>>>>>>>>>> and Jenkins machine has 16 cores. >>>>>>>>>>>>> FYI: with my local machine (12 cores) and home network, it >>>>>>>>>>>>> takes 1min 40 secs to create a Java docker image. >>>>>>>>>>>>> >>>>>>>>>>>>> The caching approach mentioned by Robert brings many benefits, >>>>>>>>>>>>> not only to this use case. >>>>>>>>>>>>> However, we would like to include this work as part of 2.21.0, >>>>>>>>>>>>> so I will move with the multi processing approach this time. >>>>>>>>>>>>> >>>>>>>>>>>>> Please let me know if you have objections. >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On Wed, Apr 15, 2020 at 4:01 PM Robert Bradshaw < >>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> Is the cost primarily in pulling these remote >>>>>>>>>>>>>> licenses/sources? I'd guess that 99.9% of the URLs remain the >>>>>>>>>>>>>> same from run >>>>>>>>>>>>>> to run. Would a simple cache, or caching proxy, be sufficient? >>>>>>>>>>>>>> >>>>>>>>>>>>>> Otherwise, a tag to check that licenses can be pulled, but >>>>>>>>>>>>>> not really pull them, might be sufficient. (Making sure the >>>>>>>>>>>>>> default is >>>>>>>>>>>>>> cheap but we don't accidentally omit them when it matters is the >>>>>>>>>>>>>> tricky bit >>>>>>>>>>>>>> I see here.) >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Wed, Apr 15, 2020 at 3:38 PM Hannah Jiang < >>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Thanks for providing feedback. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Here is what happending now and I would discuss when to run >>>>>>>>>>>>>>> the job. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> *Why it takes 7-8 mins for Java?* >>>>>>>>>>>>>>> When we list dependencies both in runtime and compile >>>>>>>>>>>>>>> environment, there are almost 1400 third party dependencies and >>>>>>>>>>>>>>> we need to >>>>>>>>>>>>>>> pull licenses/notices for all of them. >>>>>>>>>>>>>>> In addition, we need to pull source code if license is CDDL, >>>>>>>>>>>>>>> MPL, GLP or LGPL. 69 of the dependencies need to pull the >>>>>>>>>>>>>>> source code as of >>>>>>>>>>>>>>> 4/14/2020. >>>>>>>>>>>>>>> Getting dependency list + pulling licenses/notices/source >>>>>>>>>>>>>>> code takes 7-8 minutes. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Now I see there are *two patterns of failures*. >>>>>>>>>>>>>>> 1. In valid URLs. In fact, the urls are not invalid, but >>>>>>>>>>>>>>> occationally, it returns URLError. This can be resolved by >>>>>>>>>>>>>>> adding retries. >>>>>>>>>>>>>>> However, it will add runtime to the job. >>>>>>>>>>>>>>> 2. No artifacts available. Sometimes, when a new version of >>>>>>>>>>>>>>> package is released and the plugin still looks for staging >>>>>>>>>>>>>>> location. For >>>>>>>>>>>>>>> example, new zetasql packages were released on 4/14, and today >>>>>>>>>>>>>>> I saw >>>>>>>>>>>>>>> several failures with looking for staging repo. The behavior is >>>>>>>>>>>>>>> not >>>>>>>>>>>>>>> consistent, sometimes it scans correct location, sometimes not. >>>>>>>>>>>>>>> This can be >>>>>>>>>>>>>>> resolved by running the job again. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> *When the job is running?* >>>>>>>>>>>>>>> generateThirdPartyLicenses is added to :sdks:java:container >>>>>>>>>>>>>>> and it is an upstream of the docker task. As such, whenever a >>>>>>>>>>>>>>> docker is >>>>>>>>>>>>>>> created, the job is triggered. >>>>>>>>>>>>>>> :sdks:java:container:docker is added to Java PreSubmit job. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> *How to improve it?* >>>>>>>>>>>>>>> According to some ideas provided above, how about doing this? >>>>>>>>>>>>>>> Introduce a tag (ie: pull-licenses) to docker job to decide >>>>>>>>>>>>>>> if pull the files. Default tag is NOT setting pull-licenses. >>>>>>>>>>>>>>> When pull-licenses is not set, it checks if the >>>>>>>>>>>>>>> licenses/notices/source code can be pull automaticall or they >>>>>>>>>>>>>>> have urls to >>>>>>>>>>>>>>> pull from, but don't really pull. >>>>>>>>>>>>>>> When pull-license is set, files are pulled. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> For each PR (Presubmit): applying default option. The test >>>>>>>>>>>>>>> would fail if the files cannot be pulled, so committers still >>>>>>>>>>>>>>> need to fix >>>>>>>>>>>>>>> dependency errors. I believe it would reduce the running time. >>>>>>>>>>>>>>> For release: set the tag and pull the files and source code. >>>>>>>>>>>>>>> Since it is checked for each PR, pulling should finish without >>>>>>>>>>>>>>> problems. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Please let me know what you think and if there are other >>>>>>>>>>>>>>> things can be improved. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Hannah >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Wed, Apr 15, 2020 at 2:30 PM Kyle Weaver < >>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Looks like the same error as this Jira: >>>>>>>>>>>>>>>> https://issues.apache.org/jira/browse/BEAM-9764 >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Even if/when we are able to fix this particular issue, I >>>>>>>>>>>>>>>> agree it is best not to run this job except for releases >>>>>>>>>>>>>>>> because of the >>>>>>>>>>>>>>>> inherent network cost and possible reliability issues. +Hannah >>>>>>>>>>>>>>>> Jiang <[email protected]> What do you think? >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Wed, Apr 15, 2020 at 5:20 PM Thomas Weise < >>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> The new feature to assemble licenses is very useful but >>>>>>>>>>>>>>>>> appears to add several minutes (7-8?) build time to jobs >>>>>>>>>>>>>>>>> that need to >>>>>>>>>>>>>>>>> build a container. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Does it also seem to cause occasional build failures? >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> https://builds.apache.org/job/beam_PreCommit_Python2_PVR_Flink_Phrase/131/ >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Would it be possible to perform this task only during >>>>>>>>>>>>>>>>> release builds? >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>> Thomas >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>
