Continuing to think on this a bit more - FilePath abstraction doesn't look 
like it would work as it assumes a computer on the other end. What if there 
was an "External Storage Plugin" extension point that could be backed by S3 
and leveraged by other plugins for managing large files associated with 
jobs.  Ideally it would share a job lifecycle so that when jobs are renamed 
/ deleted, the related external storage area for the jobs would be managed 
as well.  Is there an extension point for something like that?

On Wednesday, November 30, 2016 at 3:36:43 PM UTC-5, Peter Hayes wrote:
>
> Thanks for the insight.  I do see that this will cause a burden on the 
> master node.  Since we are using CJP-PSE, that is mitigated somewhat as we 
> will be running quite a few masters so the ratio of jobs to masters won't 
> be terribly high.  
>
> Reusing workspaces isn't an option for us due to the architecture of 
> CJP-PSE at the moment. I actually did start using an externally mounted 
> volume but as you note, we will run into concurrency issues with shared 
> caches on the host instance and there is no reliable way to separate the 
> caches while still getting the benefit of caches as there is no distinct 
> executor number (always 1). If there was some enhancement to CJP to 
> transparently manage workspaces across executor (and support parallel build 
> execution) then we could look at that.  I did raise this with the PSE team 
> in any event a while back and I imagine that this will need to be addressed 
> as it is a step back in performance from classic persistent Jenkins 
> executors.
>
> The other thought that crossed my mind since we are running in AWS is to 
> leverage a more scalable file store within AWS like S3.  Both artifact 
> archiving and dependency caching could be good candidates. It would be cool 
> if there was an S3 backing of FilePath abstraction and plugin developers 
> could seamlessly access it via Project.getStoragePath() or something like 
> that.  Then a plugin like I am proposing could provide a more scalable 
> solution without hardwiring to S3.  I'm guessing I'm not the first to think 
> of it so there are likely challenges in doing so. 
>
> On Wednesday, November 30, 2016 at 2:04:03 PM UTC-5, Jesse Glick wrote:
>>
>> On Wed, Nov 30, 2016 at 10:18 AM, Peter Hayes <[email protected]> wrote: 
>> > each time you run a job, you 
>> > start with a fresh container without any previously cached dependencies 
>> (we 
>> > use gradle generally).  This increases the length of the build and adds 
>> > network traffic to our Artifactory instance.  I looked around for 
>> existing 
>> > plugins but didn't find any so I have started a plugin[1] based on 
>> > SimpleBuildWrapper that stores a configured set of files on the master 
>> at 
>> > the end of the build and then on the next build downloads them to 
>> master in 
>> > the original location. 
>>
>> This seems like a poor approach; rather than overloading Artifactory, 
>> you will be overloading the Jenkins master. Archiving artifacts via 
>> the Remoting channel can already wreck performance; you are talking 
>> about potentially orders of magnitude more traffic than that. 
>>
>> There are two basic approaches to this kind of problem. One, which 
>> assumes that the agents reuse workspaces between builds, is to set the 
>> local repository/cache location to a workspace location. The 
>> `docker-workflow` demo does this: 
>>
>>
>> https://github.com/jenkinsci/docker-workflow-plugin/blob/46432bbe36af17dac93cfedcc93ffa51beba1343/demo/repo/flow.groovy#L20-L22
>>  
>>
>> The other approach is to mount a volume containing the cache, letting 
>> the Docker daemon handle the storage, which the 
>> `parallel-test-executor` demo does: 
>>
>>
>> https://github.com/jenkinsci/parallel-test-executor-plugin/blob/3961df3784045df1f6f285bc2b685ead4bc8593b/demo/Makefile#L3-L27
>>  
>>
>> The volume-based approach is probably the more scalable, though there 
>> are two points to beware: at least Maven’s `install:install` will dump 
>> locally built artifacts into the repository alongside downloaded 
>> releases (probably Gradle does something similar); and Maven’s Aether 
>> repository manager is by default not thread-safe (Takari fixes this). 
>> Maven 5 may allow the cache to be properly separated (again I am not 
>> sure how Gradle fares here); in the meantime you may need to ensure 
>> that there is a distinct volume for every potentially concurrent 
>> build, for example keyed by `${JOB_NAME}/${EXECUTOR_NUMBER}`. 
>>
>> At any rate the exact solution chosen is going to depend on details of 
>> how agents are provisioned and workspaces managed, so at root this 
>> might simply be an RFE for CJP-PSE. 
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Jenkins Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/jenkinsci-dev/8f60ee45-8293-400e-bfa5-0ddbd948b967%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to