On Wed, 28 Nov 2018 at 11:33, Bogdan Costescu <bcoste...@gmail.com> wrote:
> On Mon, Nov 26, 2018 at 4:27 PM John Hearns via Beowulf < > beowulf@beowulf.org> wrote: > >> I have come across this question in a few locations. Being specific, I am >> a fan of the Julia language. Ont he Juia forum a respected developer >> recently asked what the options were for keeping code developed on a laptop >> in sync with code being deployed on an HPC system. >> > > I think out loud that many HPC codes depend crucially on a $HOME directory >> being presnet on the compute nodes as the codes look for dot files etc. in >> $HOME. I guess this can be dealt with by fake $HOMES which again sync back >> to the Repo. >> > > I don't follow you here... $HOME, dot files, repo, syncing back? And why > "Repo" with capital letter, is it supposed to be a name or something > special? > I think John is talking here about doing version control on whole HOME directories but trying to be mindful of dot files such as .bashrc and others which can be application or system specific. The first thing which comes to mind is to use branches for different cluster systems. However this also taps into backup (which is another important topic since HOME dirs are not necessarily backed up). There could be a working solution which makes use of recursive repos and git lfs support but pruning old history could still be desirable. Git would minimize the amount of storage because it's hash based. While this could make it possible to replicate your environment "wherever you go", a/ you would drag a lot history around and b/ a significantly different mindset is required to manage the whole thing. A typical HPC user may know git clone but generally is not a git adept. Developers are different and, who knows John, maybe someone will pick up your idea. Is gitfs any popular? In my HPC universe, people actually not only need code, but also data - > usually LOTS of data. Replicating the code (for scripting languages) or the > binaries (for compiled stuff) would be trivial, replicating the data would > not. Also pulling the data in or pushing it out (f.e. to/from AWS) on the > fly whenever the instance is brought up would be slow and costly. And by > the way this is in no way a new idea - queueing systems have for a long > time the concept of "pre" and "post" job stages, which could be used to > pull in code and/or data to the node(s) on which the node would be running > and clean up afterwards. >
_______________________________________________ Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf