RE: Question about deployment of math computing [EXT]

James Smith Wed, 05 Aug 2020 01:25:38 -0700

Wesley,

You will have seen my posts elsewhere - we work on large Terra/Peta byte scale 
datasets {and these aren't a large number of large records but more a very, 
very large number of small records} so the memory and response times are both 
large - less so compute in some cases but not others.

The services which use apache/mod_perl work reliably and return data for these 
- the dancer/starman sometimes fail/hang as there are no backends to serve the 
requests or those backends timeout requests to the nginx/proxy (but still 
continue using resources). The team running the backends fail to notice this - 
because there is no easy to see reporting etc on these boxes.

We do have other services which we have set up which return large amounts of 
data computed on the fly and the response time for these could be multiple 
hours - but by carefully streaming the data in apache we can get the data to 
return. A similar option isn't available in dancer (or wasn't at the time) to 
handle these sorts of requests and so similar code was impossible.

In most cases starman hasn't really been the answer and apache works 
sufficiently well. Even where people are using nginx we are often now using 
some of the alternative apache workers (mpm_event) which seem to be better/more 
reliable than nginx, and means we don't have to have completely different 
configuration setups for some of our proxies, static servers and dynamic 
content servers.

The good thing about Apache is it's dynamic rescaling - which isn't as easy 
with starman - if you have a large code base the spin up time for starman can 
be quite large as it appears (to make it efficient) load in every bit of code 
that the application needs - even if it is one of those small edge cases.

So yes use starman for simple apps if you need to, but for complex stuff I find 
mod_perl setup more reliable.

James

-----Original Message-----
From: Wesley Peng <[email protected]> 
Sent: 05 August 2020 04:31
To: [email protected]; [email protected]
Subject: Re: Question about deployment of math computing [EXT]

Hi

[email protected] wrote:
> That's interesting. After re-reading your earlier email, I think that I 
> misunderstood what you were saying.
> 
> Since this is a mod_perl listserv, I imagine that the advice will always be 
> to use mod_perl rather than starman?
> 
> Personally, I'd say either option would be fine. In my experience, the key 
> advantage of mod_perl or starman (say over CGI) is that you can pre-load 
> libraries into memory at web server startup time, and that processes are 
> persistent (although they do have limited lifetimes of course).
> 
> You could use a framework like Catalyst or Mojolicious (note Dancer is 
> another framework, but I haven't worked with it) which can support different 
> web servers, and then try the different options to see what suits you best.
> 
> One thing to note would be that usually people put a reverse proxy in front 
> of starman like Apache or Nginx (partially for serving static assets but 
> other reasons as well). Your stack could be less complicated if you just went 
> the mod_perl/Apache route.
> 
> That said, what OS are you planning to use? It's worth checking if mod_perl 
> is easily available in your target OS's package repositories. I think Red Hat 
> dropped mod_perl starting with RHEL 8, although EPEL 8 now has mod_perl in 
> it. Something to think about.

We use ubuntu 16.04 and 18.04.

We do use dancer/starman in product env, but the service only handle light 
weight API requests, for example, a restful api for data validation.

While our math computing is heavy weight service, each request will take a lot 
time to finish, so I think should it be deployed in dancer?

Since the webserver behind dancer is starman by default, starman is event 
driven, it uses very few processes ,and the process can't scale up/down 
automatically.

We deploy starman with 5 processes by default. when 5 requests coming, all 5 
starman processes are so busy to compute them, so the next request will be 
blocked. is it?

But apache mp is working as prefork way, generally it can have as many as 
thousands of processes if the resource is permitted. And the process management 
can scale up/down the children automatically.

So my real question is, for a CPU consuming service, the event driven service 
like starman, has no advantage than preforked service like Apache.

Am I right?

Thanks.

-- 
 The Wellcome Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE.

RE: Question about deployment of math computing [EXT]

Reply via email to