keon94 commented on PR #2050:
URL: 
https://github.com/apache/incubator-devlake/pull/2050#issuecomment-1150665490

   > > > > @klesh my latest commit breaks it into 3 subtasks, but the problem 
is there is no easy way (from my understanding) to cache the cloned repo so 
that after it is cloned first it gets reused by the remaining two. So right now 
we end up cloning the repo 3 times. I think we'd need a broader framework 
enhancement to support a shared state between subtasks, as well as the ability 
to close/release this shared state after the subtasks are complete. cc @hezyin
   > > > 
   > > > 
   > > > Why not? in `jira` or other data source plugins, we **shared** 
`apiclient` across multiple subtasks, we can use the same technique here.
   > > 
   > > 
   > > @klesh Do we have a way of cleaning up after all the subtasks are done 
though? Something like a shutdown hook that we can define and execute? That'll 
be needed for this case to remove the cloned repo after the collections are done
   > 
   > Good question, no, we don't, while I think this is a legitimate feature, 
we should add a method to release resources. But for this particular scenario, 
shouldn't we just keep the repo, and do `git pull` in successive collections to 
speed things up ?
   
   well, the existing collector code was "closing" (removing) the repo after it 
was done with all 3 collections, so we'd want to be doing the same here too. If 
we don't do that, we'd be leaking files on the filesystem. I've made an 
enhancement to support this in my latest commit: An extension of the PluginTask 
API that lets the user close any shared resources if they want to. I ran it on 
one repo and it seems ok. Let me know what you think.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to