boot.go could be updated to recognize NO_ARTIFACTS_STAGED_TOKEN as
well. (Should this constant be put in a common location?)

On Sat, Nov 23, 2019 at 9:16 AM Thomas Weise <t...@apache.org> wrote:
>
> JIRA: https://issues.apache.org/jira/browse/BEAM-8815
>
>
> On Fri, Nov 22, 2019 at 5:31 PM Thomas Weise <t...@apache.org> wrote:
>>
>> I'm running into the issue Kyle points out when I try to run a pipeline that 
>> does not use artifact staging:
>>
>> 2019-11-23 01:09:18,442 WARN  
>> org.apache.beam.runners.fnexecution.artifact.AbstractArtifactRetrievalService
>>   - GetManifest for 
>> /tmp/beam-artifact-staging/job_53cad419-a8c0-472c-8486-f795cc88a80f/MANIFEST 
>> failed.
>> java.util.concurrent.ExecutionException: java.io.FileNotFoundException: 
>> /tmp/beam-artifact-staging/job_53cad419-a8c0-472c-8486-f795cc88a80f/MANIFEST 
>> (No such file or directory)
>> at 
>> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.AbstractFuture.getDoneValue(AbstractFuture.java:531)
>> at 
>> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:492)
>> at 
>> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.AbstractFuture$TrustedFuture.get(AbstractFuture.java:83)
>> at 
>> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.Uninterruptibles.getUninterruptibly(Uninterruptibles.java:196)
>> at 
>> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.LocalCache$Segment.getAndRecordStats(LocalCache.java:2312)
>>
>> This happens when I use /opt/apache/beam/boot to start the worker in process 
>> environment, as it will attempt to retrieve artifacts. The same would be the 
>> case for worker pool also.
>>
>> Thomas
>>
>>
>> On Tue, Nov 12, 2019 at 5:07 PM Robert Bradshaw <rober...@google.com> wrote:
>>>
>>> FWIW, there are also discussions of adding a preparation phase for sdk
>>> harness (docker) images, such that artifacts could be staged (and
>>> installed, compiled etc.) ahead of time and shipped as part of the sdk
>>> image rather than via a side channel (and on every worker). Anyone not
>>> using these images is probably shipping dependencies in another way
>>> anyways.
>>>
>>> On Tue, Nov 12, 2019 at 5:03 PM Robert Bradshaw <rober...@google.com> wrote:
>>> >
>>> > Certainly there's a lot to be re-thought in terms of artifact staging,
>>> > especially when it comes to cross-langauge pipelines. I think it would
>>> > makes sense to have a special retrieval token for the "empty"
>>> > manifest, which would mean a staging directory would never have to be
>>> > set up if no artifacts happened to be staged.
>>> >
>>> > The UberJar avoids any artifact staging overhead as well.
>>> >
>>> > On Tue, Nov 12, 2019 at 3:30 PM Kyle Weaver <kcwea...@google.com> wrote:
>>> > >
>>> > > Hi Beamers,
>>> > >
>>> > > We can use artifact staging to make sure SDK workers have access to a 
>>> > > pipeline's dependencies. However, artifact staging is not always 
>>> > > necessary. For example, one can make sure that the environment contains 
>>> > > all the dependencies ahead of time. However, regardless of whether or 
>>> > > not artifacts are used, my understanding is an artifact manifest will 
>>> > > be written and read anyway. For example:
>>> > >
>>> > > INFO AbstractArtifactRetrievalService: GetManifest for 
>>> > > /tmp/beam-artifact-staging/.../MANIFEST -> 0 artifacts
>>> > >
>>> > > This can be a hassle, because users must set up a staging directory 
>>> > > that all workers can access, even if it isn't used aside from the 
>>> > > (empty) manifest [1]. Thomas mentioned that at Lyft they bypass 
>>> > > artifact staging altogether [2]. So I was wondering, do you all think 
>>> > > it would be reasonable or useful to create an "off switch" for artifact 
>>> > > staging?
>>> > >
>>> > > Thanks,
>>> > > Kyle
>>> > >
>>> > > [1] 
>>> > > https://lists.apache.org/thread.html/d293b4158f266be1cb6c99c968535706f491fdfcd4bb20c4e30939bb@%3Cdev.beam.apache.org%3E
>>> > > [2] 
>>> > > https://issues.apache.org/jira/browse/BEAM-5187?focusedCommentId=16972715&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16972715

Reply via email to