On Tue, Apr 28, 2020 at 4:46 PM Hannah Jiang <[email protected]> wrote:
> I guess I assumed there was some reason we needed "lightweight images" in
>> our tests (because licenses take up a lot of space IIRC), but maybe not.
>> Can you elaborate on the purpose of this option Hannah?
>
> Reducing image size is a good reason to have the pull option. There were
> user requests to create lightweight images. Now I think users can use skip
> option to create lightweight iamges. pull option is not needed.
>
> Then the discussion becomes which one should be the default mode?
> According to feedback, skip should be the default mode, and change it to
> add mode when running Jenkins test. And the docker-pull-licenses tag is
> binary again.
> It seems like there is an easy way to pass the tag to all Jenkins test,
> which is adding *context.switches("-Pdocker-pull-licenses"**)* to
> CommonJobProperties.groovy
> <https://github.com/apache/beam/blob/master/.test-infra/jenkins/CommonJobProperties.groovy#L150>.
> I'm not very familiar with Jenkins, does this would work as expected?
> If it sounds like a bad idea to pass this tag to ALL tests, I would ask
> help from community to identify those tests which create docker images
> (Java and Python for now) and pass the tag to the tests.
>
That could work.
The other question is, how easy would it be for a release manager to
accidentally push non-compliant images?
We have definitely not explored how cheap we can make doing the "real
thing" can be, which would IMHO be best.
>
>
> On Tue, Apr 28, 2020 at 3:50 PM Kyle Weaver <[email protected]> wrote:
>
>> > To overwrite, we need to pass the tag for ALL Jenkins tests that create
>> docker images, which is quite a bunch of work.
>>
>> If it comes to that, I'd rather we do the work of passing tags instead of
>> users.
>>
>> > Does anyone know if there is an easy way to pass the tag to many tests?
>> or overwrite the default mode, which will be defined at gradle.properties,
>> to pull ONLY with Jenkins tests?
>>
>> There is precedence for the latter option:
>> https://github.com/apache/beam/blob/1905dbde5ca0858fce89f65c761e88511840a384/build.gradle#L45
>>
>> > (On that note, however, pull seems bad for both.)
>>
>> I guess I assumed there was some reason we needed "lightweight images" in
>> our tests (because licenses take up a lot of space IIRC), but maybe not.
>> Can you elaborate on the purpose of this option Hannah?
>>
>> On Tue, Apr 28, 2020 at 6:40 PM Robert Bradshaw <[email protected]>
>> wrote:
>>
>>> Fundamentally having license checking off by default is dangerous for
>>> releases, but having it on by default is annoying for developers. (On that
>>> note, however, pull seems bad for both.) Is it possible to make it a gradle
>>> target that only runs when something (specifically dependencies, or the
>>> files declaring them) have changed, and would leverage the gradle cache
>>> (and hence be cheap) otherwise?
>>>
>>> Alternatively, I think we should invest in URL caching. These urls
>>> should be immutable; let's only download them once, ever.
>>>
>>> On Tue, Apr 28, 2020 at 3:31 PM Hannah Jiang <[email protected]>
>>> wrote:
>>>
>>>> Do you mean set the default mode to skip and overwrite it to pull mode
>>>> with Jenkins test? To overwrite, we need to pass the tag for ALL Jenkins
>>>> tests that create docker images, which is quite a bunch of work.
>>>> Does anyone know if there is an easy way to pass the tag to many tests?
>>>> or overwrite the default mode, which will be defined at gradle.properties,
>>>> to pull ONLY with Jenkins tests? We can add a step to do this after cloning
>>>> a git branch to Jenkins machine. But it seems easier to change the local
>>>> gradle.properties for local development purpose..?
>>>>
>>>>
>>>>
>>>> On Tue, Apr 28, 2020 at 3:09 PM Thomas Weise <[email protected]> wrote:
>>>>
>>>>> Can this be solved by enabling the license magic for the Jenkins jobs?
>>>>>
>>>>> I also think the default should be off for better local development
>>>>> experience.
>>>>>
>>>>> On Tue, Apr 28, 2020 at 3:05 PM Hannah Jiang <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> We need to make sure the release images contains licenses/notices in
>>>>>> order to avoid potential legal issues. The purpose of setting the default
>>>>>> to pull is checking PRs which introduce new dependencies include thier
>>>>>> licenses as well, in a way of auto pulling or tool pulling, to make sure
>>>>>> all licenses are included when we create release images. If setting
>>>>>> default
>>>>>> to skip, we only check licenses when we create release images and may see
>>>>>> many issues with the licenses, and the release would be delayed, maybe a
>>>>>> lot.
>>>>>>
>>>>>>
>>>>>> On Tue, Apr 28, 2020 at 2:50 PM Kyle Weaver <[email protected]>
>>>>>> wrote:
>>>>>>
>>>>>>> The different modes make sense. One disagreement: I think the
>>>>>>> default should be skip. I imagine few users would want to put licenses
>>>>>>> in
>>>>>>> their own images.
>>>>>>>
>>>>>>> On Tue, Apr 28, 2020 at 5:36 PM Hannah Jiang <[email protected]>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> The above 1,2,3,4 are merged to master.
>>>>>>>> Here I would like to change behaviors with *docker-pull-licenses*
>>>>>>>> tag and would like to collect feedbacks from community before
>>>>>>>> finalizing my
>>>>>>>> PR.
>>>>>>>>
>>>>>>>> docker-pull-licenses can be set to one of ['add', 'pull', 'skip'].
>>>>>>>>
>>>>>>>> - *docker-pull-licenses=add* pulls licenses/notices and the
>>>>>>>> files are added to docker images. This tag is used to create
>>>>>>>> release docker
>>>>>>>> images.
>>>>>>>> - *docker-pull-licenses=pull* pulls licenses/notices but do not
>>>>>>>> add the files to docker images so that create lightweight images.
>>>>>>>> This is
>>>>>>>> the default mode, and docker images created by Jenkins test use
>>>>>>>> this mode.
>>>>>>>> This will make sure the files are all pullable and detect issues
>>>>>>>> with the
>>>>>>>> tool, except the ADD part at DockerFile, which is rarely likely to
>>>>>>>> be an
>>>>>>>> issue. It is better than the checking URLs approach mentioned above,
>>>>>>>> because it is a more similar process like creating release images
>>>>>>>> and code
>>>>>>>> can be simplified so that make it easier to maintain.
>>>>>>>> - *docker-pull-licenses=skip* skips all license pulling related
>>>>>>>> tasks. This tag is provided for users who customize their images
>>>>>>>> with
>>>>>>>> DockerFile provided by us and would not like to deal with license
>>>>>>>> stuff.
>>>>>>>> Without this tag, the users should change the .gradle file to get
>>>>>>>> rid of
>>>>>>>> it. It also can be used for local development when developers want
>>>>>>>> to solve
>>>>>>>> license related issues later.
>>>>>>>>
>>>>>>>> I would like to see this tag is adopted by all docker images
>>>>>>>> released by Beam, including Flink and Spark job server images, and
>>>>>>>> share
>>>>>>>> the same default mode.
>>>>>>>> Does this sound good? Are there any concerns?
>>>>>>>>
>>>>>>>> Please let me know what you think.
>>>>>>>> Hannah
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Thu, Apr 16, 2020 at 11:46 AM Hannah Jiang <
>>>>>>>> [email protected]> wrote:
>>>>>>>>
>>>>>>>>> Ok, so the fianl approach is
>>>>>>>>> 1. Using multi threading with 16 threads.
>>>>>>>>> 2. Introduce docker-pull-license tag, by default it is disabled.
>>>>>>>>> 3. By default, only check if urls return 200, not actually pull
>>>>>>>>> the files to reduce image size. Licenses add 85MB of image size as of
>>>>>>>>> today.
>>>>>>>>> 4. Add retries to avoid flakiness. Initially it was added to
>>>>>>>>> download part only, assuming urlib is stable when validates urls, but
>>>>>>>>> it is
>>>>>>>>> not true.
>>>>>>>>>
>>>>>>>>> For caching, it is definitely useful. It will mitigate flakiness
>>>>>>>>> further, like when github is down (many urls point to github). It also
>>>>>>>>> reduces unnecessary internet traffic.
>>>>>>>>> Let's reconsider it after the current work is merged to 2.21.0.
>>>>>>>>>
>>>>>>>>> Thank you all for providing feedback and ideas.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Thu, Apr 16, 2020 at 10:24 AM Kyle Weaver <[email protected]>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> > I tried with multi processing and it improved the performance a
>>>>>>>>>> lot.
>>>>>>>>>>
>>>>>>>>>> Great! Though it won't help with flakes, so as you said we should
>>>>>>>>>> still look into caching as well.
>>>>>>>>>>
>>>>>>>>>> > You could try using a threadpool
>>>>>>>>>>
>>>>>>>>>> +1
>>>>>>>>>>
>>>>>>>>>> > However, we would like to include this work as part of 2.21.0
>>>>>>>>>>
>>>>>>>>>> I had marked the jira as a blocker for 2.21.0 because I was
>>>>>>>>>> afraid something was broken, but now it looks like the failures were
>>>>>>>>>> just
>>>>>>>>>> flakes. So BEAM-9764
>>>>>>>>>> <https://issues.apache.org/jira/browse/BEAM-9764> should not be
>>>>>>>>>> a release blocker.
>>>>>>>>>>
>>>>>>>>>> On Thu, Apr 16, 2020 at 1:00 PM Thomas Weise <[email protected]>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi Hannah,
>>>>>>>>>>>
>>>>>>>>>>> Thanks for investigating!
>>>>>>>>>>>
>>>>>>>>>>> I think it would be great to eliminate the overhead for local
>>>>>>>>>>> builds (by default turn off the license assembly) and make it as
>>>>>>>>>>> lightweight as possible in the frequent CI path.
>>>>>>>>>>>
>>>>>>>>>>> Thomas
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Thu, Apr 16, 2020 at 1:37 AM Hannah Jiang <
>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> I tried to check if urls are valid instead of pulling the files
>>>>>>>>>>>> and it reduced only 1 min of running time. So, it's not an option
>>>>>>>>>>>> here.
>>>>>>>>>>>>
>>>>>>>>>>>> I tried with multi processing and it improved the performance a
>>>>>>>>>>>> lot.
>>>>>>>>>>>> With 12 subprocesses, the running time reduced to 49 seconds,
>>>>>>>>>>>> and with 16 cores, it reduced to 18 seconds.
>>>>>>>>>>>> The number of subprocesses is defined by the number of cores,
>>>>>>>>>>>> and Jenkins machine has 16 cores.
>>>>>>>>>>>> FYI: with my local machine (12 cores) and home network, it
>>>>>>>>>>>> takes 1min 40 secs to create a Java docker image.
>>>>>>>>>>>>
>>>>>>>>>>>> The caching approach mentioned by Robert brings many benefits,
>>>>>>>>>>>> not only to this use case.
>>>>>>>>>>>> However, we would like to include this work as part of 2.21.0,
>>>>>>>>>>>> so I will move with the multi processing approach this time.
>>>>>>>>>>>>
>>>>>>>>>>>> Please let me know if you have objections.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Wed, Apr 15, 2020 at 4:01 PM Robert Bradshaw <
>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Is the cost primarily in pulling these remote
>>>>>>>>>>>>> licenses/sources? I'd guess that 99.9% of the URLs remain the
>>>>>>>>>>>>> same from run
>>>>>>>>>>>>> to run. Would a simple cache, or caching proxy, be sufficient?
>>>>>>>>>>>>>
>>>>>>>>>>>>> Otherwise, a tag to check that licenses can be pulled, but not
>>>>>>>>>>>>> really pull them, might be sufficient. (Making sure the default
>>>>>>>>>>>>> is cheap
>>>>>>>>>>>>> but we don't accidentally omit them when it matters is the tricky
>>>>>>>>>>>>> bit I see
>>>>>>>>>>>>> here.)
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Wed, Apr 15, 2020 at 3:38 PM Hannah Jiang <
>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks for providing feedback.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Here is what happending now and I would discuss when to run
>>>>>>>>>>>>>> the job.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> *Why it takes 7-8 mins for Java?*
>>>>>>>>>>>>>> When we list dependencies both in runtime and compile
>>>>>>>>>>>>>> environment, there are almost 1400 third party dependencies and
>>>>>>>>>>>>>> we need to
>>>>>>>>>>>>>> pull licenses/notices for all of them.
>>>>>>>>>>>>>> In addition, we need to pull source code if license is CDDL,
>>>>>>>>>>>>>> MPL, GLP or LGPL. 69 of the dependencies need to pull the source
>>>>>>>>>>>>>> code as of
>>>>>>>>>>>>>> 4/14/2020.
>>>>>>>>>>>>>> Getting dependency list + pulling licenses/notices/source
>>>>>>>>>>>>>> code takes 7-8 minutes.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Now I see there are *two patterns of failures*.
>>>>>>>>>>>>>> 1. In valid URLs. In fact, the urls are not invalid, but
>>>>>>>>>>>>>> occationally, it returns URLError. This can be resolved by
>>>>>>>>>>>>>> adding retries.
>>>>>>>>>>>>>> However, it will add runtime to the job.
>>>>>>>>>>>>>> 2. No artifacts available. Sometimes, when a new version of
>>>>>>>>>>>>>> package is released and the plugin still looks for staging
>>>>>>>>>>>>>> location. For
>>>>>>>>>>>>>> example, new zetasql packages were released on 4/14, and today I
>>>>>>>>>>>>>> saw
>>>>>>>>>>>>>> several failures with looking for staging repo. The behavior is
>>>>>>>>>>>>>> not
>>>>>>>>>>>>>> consistent, sometimes it scans correct location, sometimes not.
>>>>>>>>>>>>>> This can be
>>>>>>>>>>>>>> resolved by running the job again.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> *When the job is running?*
>>>>>>>>>>>>>> generateThirdPartyLicenses is added to :sdks:java:container
>>>>>>>>>>>>>> and it is an upstream of the docker task. As such, whenever a
>>>>>>>>>>>>>> docker is
>>>>>>>>>>>>>> created, the job is triggered.
>>>>>>>>>>>>>> :sdks:java:container:docker is added to Java PreSubmit job.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> *How to improve it?*
>>>>>>>>>>>>>> According to some ideas provided above, how about doing this?
>>>>>>>>>>>>>> Introduce a tag (ie: pull-licenses) to docker job to decide
>>>>>>>>>>>>>> if pull the files. Default tag is NOT setting pull-licenses.
>>>>>>>>>>>>>> When pull-licenses is not set, it checks if the
>>>>>>>>>>>>>> licenses/notices/source code can be pull automaticall or they
>>>>>>>>>>>>>> have urls to
>>>>>>>>>>>>>> pull from, but don't really pull.
>>>>>>>>>>>>>> When pull-license is set, files are pulled.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> For each PR (Presubmit): applying default option. The test
>>>>>>>>>>>>>> would fail if the files cannot be pulled, so committers still
>>>>>>>>>>>>>> need to fix
>>>>>>>>>>>>>> dependency errors. I believe it would reduce the running time.
>>>>>>>>>>>>>> For release: set the tag and pull the files and source code.
>>>>>>>>>>>>>> Since it is checked for each PR, pulling should finish without
>>>>>>>>>>>>>> problems.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Please let me know what you think and if there are other
>>>>>>>>>>>>>> things can be improved.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hannah
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Wed, Apr 15, 2020 at 2:30 PM Kyle Weaver <
>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Looks like the same error as this Jira:
>>>>>>>>>>>>>>> https://issues.apache.org/jira/browse/BEAM-9764
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Even if/when we are able to fix this particular issue, I
>>>>>>>>>>>>>>> agree it is best not to run this job except for releases
>>>>>>>>>>>>>>> because of the
>>>>>>>>>>>>>>> inherent network cost and possible reliability issues. +Hannah
>>>>>>>>>>>>>>> Jiang <[email protected]> What do you think?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Wed, Apr 15, 2020 at 5:20 PM Thomas Weise <[email protected]>
>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> The new feature to assemble licenses is very useful but
>>>>>>>>>>>>>>>> appears to add several minutes (7-8?) build time to jobs that
>>>>>>>>>>>>>>>> need to
>>>>>>>>>>>>>>>> build a container.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Does it also seem to cause occasional build failures?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> https://builds.apache.org/job/beam_PreCommit_Python2_PVR_Flink_Phrase/131/
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Would it be possible to perform this task only during
>>>>>>>>>>>>>>>> release builds?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>> Thomas
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>