On Fri, Oct 21, 2016 at 12:14 PM, Martin Domdey <[email protected]> wrote:
> "(so tools that rely solely on the database / API *should* be fine, if > they aren't using their home directories for anything write) " > > Do you mean with that, that I can use an own tool server instance and I > can work with API, database and /shared/tcl/bin/tclsh8.7 ? > So can I get a tool server from you? > I'm not sure I understand, but any existing tools that talk to the database/apis should be fine. If you would like to make a new tool for a new project - you can request it as usual using the Tools Access request page. > > Martin ... > > > *Gesendet:* Freitag, 21. Oktober 2016 um 20:56 Uhr > *Von:* "Madhumitha Viswanathan" <[email protected]> > *An:* "Wikimedia Labs" <[email protected]> > *Betreff:* Re: [Labs-l] Disruptive Tools NFS maintenance on 11/2/2016 > Hi, > > On Fri, Oct 21, 2016 at 11:29 AM, Martin Domdey <[email protected]> wrote: >> >> Why do you need 48 hours for that? >> >> I'm submitting very many cron jobs the day to deliver much stuff and >> services to a lot of users in dewiki and other wikis. An outage window of >> 48 hours (!) is simply not possible. >> Please suggest a solution how I can work on during the outage window or >> at least a crontab that can handle the data and files on tools.taxonbot. >> You maybe can install a NFS redundancy for at least that time. >> >> > Like mentioned, it may take upto 48 hours for the data migration to be > complete - hopefully lesser, but we are dealing with a complex system with > a nontrivial amount of data. The transition *is* to a redundant NFS server > setup - we need a long maintenance window to make that happen. A full copy > of tools data to a new server takes many days(~4-20!) depending on various > factors, and we're doing successively smaller syncs to make the final > migration period as small as possible. However, it's still not something we > can entirely control - the maps project was migrated earlier this week, and > the final sync still took about a day (even though maps has less data). So > the 48h is a conservative estimate that allows us to do the migration in an > orderly fashion. > > To be more explicit, here is a (non exhaustive) list of things we expect > to not work for the duration of the transition (which is up to 48h, but > hopefully lesser): > > 1. Submitting new jobs to the grid > 2. Restarting failing jobs on the grid > 3. Deploying new code / writing anything on your tool / home > directories > 4. Any bots / webservices that require write access on their home > directories to work (so tools that rely solely on the database / API > *should* be fine, if they aren't using their home directories for anything > write) > 5. New cron jobs (because of #1) > 6. New tool creation > > Any previously submitted jobs that aren't writing to NFS (provided they > don't die), will continue to run. Crons submit jobs to the grid, and > without read-write NFS, job scheduling will not work. We apologize for the > service interruption, but it is required to have a long term stable & > reliable tools. > > We're working on a detailed checklist for the transition, and will email > it to the list once we have it available. > > >> Thank you >> Martin ... >> >> >> >> *Gesendet:* Freitag, 21. Oktober 2016 um 20:00 Uhr >> *Von:* "Madhumitha Viswanathan" <[email protected]> >> *An:* "Wikimedia Labs" <[email protected]>, >> [email protected] >> *Betreff:* [Labs-l] Disruptive Tools NFS maintenance on 11/2/2016 >> As the next step in our storage redundancy and reliability efforts for >> Labs, we have a significant migration coming up on 11/2 starting 08:00 >> PST(15:00 UTC) involving the tools NFS share. The maintenance window can be >> up to 48h long, and will affect most running tools. At the end of the >> migration, everything (except transient jobs) should ideally be working the >> same way as they were before the migration, but better. >> >> Here's what to expect during the maintenance window: >> >> * The tools NFS share (/data/project and /home) will be read-only for the >> duration of the maintenance, so no new data or logs will get written to it. >> * New jobs cannot be submitted for the whole maintenance window - this >> means submitting jobs through cron or tools-mail will not function, >> although tools-mail can continue to send emails. >> * Current jobs might keep running, but won't get rescheduled if they die. >> If they do not die and aren't writing to NFS they should be fine. >> * All exec nodes will get depooled, rebooted and repooled and jobs that >> don't get rescheduled automatically will have died and need manual restarts. >> >> Do let us know if you have any questions or concerns on the lists or on >> #wikimedia-labs. >> >> -- >> Madhumitha Viswanathan >> Operations Engineer, Wikimedia Labs >> _______________________________________________ Labs-l mailing list >> [email protected] https://lists.wikimedia.org/ >> mailman/listinfo/labs-l >> >> _______________________________________________ >> Labs-l mailing list >> [email protected] >> https://lists.wikimedia.org/mailman/listinfo/labs-l >> > > > > -- > --Madhu :) > _______________________________________________ Labs-l mailing list > [email protected] https://lists.wikimedia.org/ > mailman/listinfo/labs-l > > _______________________________________________ > Labs-l mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/labs-l > > -- --Madhu :)
_______________________________________________ Labs-l mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/labs-l
