Certainly there's a lot to be re-thought in terms of artifact staging,
especially when it comes to cross-langauge pipelines. I think it would
makes sense to have a special retrieval token for the "empty"
manifest, which would mean a staging directory would never have to be
set up if no artifacts happened to be staged.

The UberJar avoids any artifact staging overhead as well.

On Tue, Nov 12, 2019 at 3:30 PM Kyle Weaver <kcwea...@google.com> wrote:
>
> Hi Beamers,
>
> We can use artifact staging to make sure SDK workers have access to a 
> pipeline's dependencies. However, artifact staging is not always necessary. 
> For example, one can make sure that the environment contains all the 
> dependencies ahead of time. However, regardless of whether or not artifacts 
> are used, my understanding is an artifact manifest will be written and read 
> anyway. For example:
>
> INFO AbstractArtifactRetrievalService: GetManifest for 
> /tmp/beam-artifact-staging/.../MANIFEST -> 0 artifacts
>
> This can be a hassle, because users must set up a staging directory that all 
> workers can access, even if it isn't used aside from the (empty) manifest 
> [1]. Thomas mentioned that at Lyft they bypass artifact staging altogether 
> [2]. So I was wondering, do you all think it would be reasonable or useful to 
> create an "off switch" for artifact staging?
>
> Thanks,
> Kyle
>
> [1] 
> https://lists.apache.org/thread.html/d293b4158f266be1cb6c99c968535706f491fdfcd4bb20c4e30939bb@%3Cdev.beam.apache.org%3E
> [2] 
> https://issues.apache.org/jira/browse/BEAM-5187?focusedCommentId=16972715&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16972715

Reply via email to