Thanks for pulling together these directions Luca! I did a little clean-up and will try to remember to do so more routinely.
Adding to what Diego said, I also started using stat1007 because it has the most access to resources (dumps, Hadoop, MariaDB), and then my virtual environments, config files, etc. are there and so I tend to do all of my work on stat1007 even when the other stat machines might work for other projects. Putting the GPU on stat1005 helped me diversify a little but I'm very excited to hear that the stat machines will be more standardized so it matters less which machine I choose. While I have no desire to be spread out across the machines (a few projects on stat1004, a few on stat1005, etc.) because then I'll certainly lose track of where different projects are, I would be open to trying to choose another host as my "main" workspace. Best, Isaac On Tue, Feb 18, 2020 at 10:53 AM Andrew Otto <[email protected]> wrote: > I added a 'GPU?' column too. :) THANKS LUCA! > > On Tue, Feb 18, 2020 at 11:51 AM Luca Toscano <[email protected]> > wrote: > >> Hey Diego, >> >> added a section at the end of the page with the info requested, let me >> know if anything is missing :) >> >> Luca >> >> Il giorno mar 18 feb 2020 alle ore 17:37 Diego Saez-Trumper < >> [email protected]> ha scritto: >> >>> Thanks for this Luca. >>> >>> I tend to use stat1007 because I know that machine has a lot of ram/cpu >>> and HDFS access. From other statsX I'm not sure which of them have what >>> resources (I know at least one of them doesn't have HDFS access). There is >>> a table where I can look at a summary of resources per machine? >>> >>> Thanks again. >>> >>> On Tue, Feb 18, 2020 at 8:53 AM Luca Toscano <[email protected]> >>> wrote: >>> >>>> Hi everybody! >>>> >>>> I created the following doc: >>>> https://wikitech.wikimedia.org/wiki/Analytics/Tutorials/Analytics_Client_Nodes >>>> >>>> It contains two FAQ: >>>> - How do I ensure that there is enough space on disk before storing big >>>> datasets/files ? >>>> - How do I check the space used by my files/data on stat/notebook hosts >>>> ? >>>> >>>> Please read them and let me know if anything is not clear or missing. >>>> We have plenty of space on stat100X hosts, but we tend to cluster on single >>>> machines like stat1007 for some reason, ending up in fighting for >>>> resources. >>>> >>>> On a related note, we are going to work on unifying stat/notebook >>>> puppet configs in https://phabricator.wikimedia.org/T243934, so >>>> eventually all Analytics clients will be exactly the same. >>>> >>>> Thanks! >>>> >>>> Luca (on behalf of the Analytics team) >>>> >>>> >>>> _______________________________________________ >>>> Research-Internal mailing list >>>> [email protected] >>>> https://lists.wikimedia.org/mailman/listinfo/research-internal >>>> >>> _______________________________________________ >>> Analytics mailing list >>> [email protected] >>> https://lists.wikimedia.org/mailman/listinfo/analytics >>> >> _______________________________________________ >> Research-Internal mailing list >> [email protected] >> https://lists.wikimedia.org/mailman/listinfo/research-internal >> > _______________________________________________ > Analytics mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/analytics > -- Isaac Johnson (he/him/his) -- Research Scientist -- Wikimedia Foundation
_______________________________________________ Analytics mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/analytics
