Well said. Gesendet von Maximilian's iPhone. (Sent from Maximilian's iPhone.)
> On May 24, 2014, at 4:08, Petr Bena <[email protected]> wrote: > > I don't say we should do this instead of the current scheduled > improvement, but just to prevent future issues. > > There will be never a server that is "good enough" no matter how fast > hardware you have I can create a tool that will kill the server anyway > just by using too much resources (in this case, I could just create a > bunch of separate tools that would each run 15 tasks that would write > / read NFS as much as it can). > > This in case of 1 person would be violation of TOS. Now imagine a > number of "uneducated" users of tool labs who are doing EXACTLY this > just in case of 1 task with few jobs. They all together would be just > like this "evil" 1 user and of course wouldn't really violate any > rule, which is logical. > > Right now we do: > * limit a number of tasks per tool > * ensure there is enough of RAM for each tool using SGE scheduling > > What we don't do: > * monitor network usage per tool > * monitor IO usage per tool > > This is not to be some kind of evil admin who would slap users who > operate tools which use too much, neither to set up more stupid > restrictions, I myself hate restrictions of all kind. I am aware that > popular tools that are accessed heavily will produce lot of traffic > even if optimized so these would be fine even if on top of the list. > But what I think that would be useful and important, is to let people > who operate tools that seems underoptimized know about that, and > eventually help them to optimize these so that they eat less > resources. > > I believe that this would, in a long term, saved a lot of resources > and money. There /are/ tools that need optimization right now, and > these are one of reasons why other tools are dying now. They are dying > because the systems are overloaded, and systems are overloaded because > they are not being used effectively. > > On Sat, May 24, 2014 at 9:56 AM, Gerard Meijssen > <[email protected]> wrote: >> Hoi, >> Nice in theory. However tools DO die when others produce too much shit for >> the server to handle. >> >> In my mind the most important thing is for Labs to be operational. Worrying >> about dimes and cents is too expensive when it is at the cost of a >> diminished service to Labs users. >> >> Yes, even when performance is always ensured it pays to target bad practices >> because sure as hell, some things do need improvements and it pays to make >> sure that software gets optimised. >> Thanks, >> GerardM >> >> >>> On 24 May 2014 09:39, Petr Bena <[email protected]> wrote: >>> >>> what about doing some steps to optimize current resource usage so that >>> it's not needed to put more and more money to increase the HW >>> resources? >>> >>> for example I believe there is a number of tools that are using nfs >>> servers in insane way, for example generating tons of temporary data >>> that could be instead stored to /tmp instead of /data/project also the >>> static binaries that are in /data/project could be probably cached in >>> memory somehow so that they don't need to be loaded over network >>> everytime the task restart. >>> >>> Perhaps installing sar-like monitoring tool on nfs server would help >>> to discover which tool uses the nfs most and such a report could help >>> developers of these tools to figure out where is a need for >>> optimization. I myself have some idea of how labs work so my own tools >>> are usually very optimized to use these network resources (and even >>> disk storage) as less as possible, but others might not be aware of >>> that and may need some help optimizing these. >>> >>> On Fri, May 23, 2014 at 6:28 PM, Marc A. Pelletier <[email protected]> >>> wrote: >>>> Hello everyone, >>>> >>>> In the following week or two, we are planning on adding another bound >>>> network port to increase the NFS server's bandwidth (which is, >>>> currently, saturating at regular interval). >>>> >>>> This will imply a short period of downtime (on the order of 10 minutes >>>> or so) during which no NFS service will be provided. In theory, this >>>> will result in file access simply stalling and resuming at the end of >>>> the outage, but processes that have timeouts may be disrupted (in >>>> particular, web service access will likely report gateway issues during >>>> that interval). >>>> >>>> While this is not set in stone, I am aiming for Friday, May 30 at 18:00 >>>> UTC for the downtime. I will notify this list with a confirmation or a >>>> new schedule in at least three days in advance. >>>> >>>> Thanks for your patience, >>>> >>>> -- Marc >>>> >>>> _______________________________________________ >>>> Labs-l mailing list >>>> [email protected] >>>> https://lists.wikimedia.org/mailman/listinfo/labs-l >>> >>> _______________________________________________ >>> Labs-l mailing list >>> [email protected] >>> https://lists.wikimedia.org/mailman/listinfo/labs-l >> >> >> >> _______________________________________________ >> Labs-l mailing list >> [email protected] >> https://lists.wikimedia.org/mailman/listinfo/labs-l > > _______________________________________________ > Labs-l mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/labs-l _______________________________________________ Labs-l mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/labs-l
