Re: [jruby-dev] Improving load time by parallelizing load/parse?

Charles Oliver Nutter Tue, 25 Oct 2011 08:51:51 -0700

So here's one discovery...I turned on the JVM's sampling profiler
(--sample flag to JRuby) when running "rake test" and discovered that
it causes *four* JVM processes to be launched. Seriously?


If they're all booting Rails, it's no wonder "rake test" takes
forever. I'm looking into it now.

- Charlie

On Tue, Oct 25, 2011 at 10:36 AM, Charles Oliver Nutter
<head...@headius.com> wrote:
> That's a big unknown for us. It does not seem to be heavily IO-driven,
> since using nailgun does help "pure load" scenarios speed up
> significantly. For example:
>
> INIT OF JRUBY ALONE
>
> system ~/projects/jruby $ jruby bench/bench_jruby_init.rb 5
>                          user     system      total        real
> in-process `jruby `   0.043000   0.000000   0.043000 (  0.027000)
>                          user     system      total        real
> in-process `jruby `   0.045000   0.000000   0.045000 (  0.045000)
>                          user     system      total        real
> in-process `jruby `   0.018000   0.000000   0.018000 (  0.018000)
>                          user     system      total        real
> in-process `jruby `   0.014000   0.000000   0.014000 (  0.014000)
>                          user     system      total        real
> in-process `jruby `   0.014000   0.000000   0.014000 (  0.014000)
>
> INIT OF JRUBY PLUS -rubygems
>
> system ~/projects/jruby $ jruby bench/bench_jruby_init.rb 5 -rubygems
>                          user     system      total        real
> in-process `jruby -rubygems`  0.193000   0.000000   0.193000 (  0.177000)
>                          user     system      total        real
> in-process `jruby -rubygems`  0.085000   0.000000   0.085000 (  0.085000)
>                          user     system      total        real
> in-process `jruby -rubygems`  0.085000   0.000000   0.085000 (  0.085000)
>                          user     system      total        real
> in-process `jruby -rubygems`  0.071000   0.000000   0.071000 (  0.071000)
>                          user     system      total        real
> in-process `jruby -rubygems`  0.076000   0.000000   0.076000 (  0.076000)
>
> ...PLUS require 'activerecord'
>
> system ~/projects/jruby $ jruby bench/bench_jruby_init.rb 5 "-rubygems
> -e \"require 'activerecord'\""
>                          user     system      total        real
> in-process `jruby -rubygems -e "require 'activerecord'"`  0.192000
> 0.000000   0.192000 (  0.176000)
>                          user     system      total        real
> in-process `jruby -rubygems -e "require 'activerecord'"`  0.087000
> 0.000000   0.087000 (  0.087000)
>                          user     system      total        real
> in-process `jruby -rubygems -e "require 'activerecord'"`  0.087000
> 0.000000   0.087000 (  0.087000)
>                          user     system      total        real
> in-process `jruby -rubygems -e "require 'activerecord'"`  0.069000
> 0.000000   0.069000 (  0.069000)
>                          user     system      total        real
> in-process `jruby -rubygems -e "require 'activerecord'"`  0.078000
> 0.000000   0.078000 (  0.078000)
>
> Note how much startup improves for subsequent runs in the -rubygems
> and -r activerecord cases. If it were solely IO-bound, we wouldn't see
> that much improvement.
>
> Startup time issues are a combination of factors:
>
> * IO, including filesystem searching and the actual read of the file
> * Parsing and AST building
> * JVM being cold; our parser, interpreter, core classes are all
> running at their slowest
> * Internal caches getting vigorously flushed at boot, since there's so
> many methods and constants being created
>
> My parallelizing patch helps the first three but didn't make a big
> difference in actual execution of commands like "rake test" in a Rails
> app. I'm going to poke at startup a bit more today and see if I can
> figure out how much time in "rake test" is *actually* booting versus
> execution.
>
> - Charlie
>
> On Mon, Oct 24, 2011 at 11:56 PM, Andrew Cholakian <and...@andrewvc.com> 
> wrote:
>> I'm wondering how much of the issue is IO and how much is CPU time required
>> to parse. Would it be easiest to just do a quick scan for module
>> dependencies and cache all the files ASAP, then parse serially? I'm not sure
>> if it'd be possible to do a quick parse for just 'require'.
>>
>> On Mon, Oct 24, 2011 at 9:47 PM, Jonathan Coveney <jcove...@gmail.com>
>> wrote:
>>>
>>> I was thinking about the case below, and I think that this is an
>>> interesting idea, but I'm wondering how you would resolve certain
>>> difficulties. Imagine:
>>>
>>> require 'ALib'
>>> a = 10+2
>>> require 'BLib'
>>> b=a/2
>>>
>>> where ALib is a lot of random stuff, then:
>>> class Fixnum
>>>   def +(other)
>>>     self*other
>>>   end
>>> end
>>>
>>> and BLib is a lot of random stuff, then:
>>> class Fixnum
>>>   def /(other)
>>>     self*other*other
>>>   end
>>> end
>>>
>>> How would you know how to resolve these various pieces? I guess you
>>> mention eager interpreting and then a cache, but given that any module can
>>> change any other module's functionality, you would have to keep track of
>>> everything that you eagerly interpreted, and possibly go back depending on
>>> what your module declares. How else would you know that a module that
>>> doesn't depend on any other modules is going to actually execute in a
>>> radically different way because of another module that you have included?
>>> The only way I can think of would be if the thread executing any given piece
>>> of code kept track of the calls that it made and where, and then went back
>>> to the earliest piece it had to in the case that anything was
>>> rewritten...but then you could imagine an even more convoluted case where
>>> module A changes an earlier piece of module B such that it changes how a
>>> later piece of itself works...and so on.
>>>
>>> Perhaps this is incoherent, but I think the question of how you deal with
>>> the fact that separately running pieces of code can change the fundamental
>>> underlying state of the world.
>>>
>>> 2011/10/24 Charles Oliver Nutter <head...@headius.com>
>>>>
>>>> Nahi planted an interesting seed on Twitter...what if we could
>>>> parallelize parsing of Ruby files when loading a large application?
>>>>
>>>> At a naive level, parallelizing the parse of an individual file is
>>>> tricky to impossible; the parser state is very much straight-line. But
>>>> perhaps it's possible to parallelize loading of many files?
>>>>
>>>> I started playing with parallelizing calls to the parser, but that
>>>> doesn't really help anything; every call to the parser blocks waiting
>>>> for it to complete, and the contents are not interpreted until after
>>>> that point. That means that "require" lines remain totally opaque,
>>>> preventing us from proactively starting threaded parses of additional
>>>> files. But there lies the opportunity: what if load/require requests
>>>> were done as Futures, require/load lines were eagerly interpreted by
>>>> submitting load/require requests to a thread pool, and child requires
>>>> could be loading and parsing at the same time as the parent
>>>> file...without conflicting.
>>>>
>>>> In order to do this, I think we would need to make the following
>>>> modifications:
>>>>
>>>> * LoadService would need to explose Future-based versions of "load"
>>>> and "require". The initial file loaded as the "main" script would be
>>>> synchronous, but subsequent requires and loads could be shunted to a
>>>> thread pool.
>>>> * The parser would need to initiate eager load+parser of files
>>>> encountered in require-like and load-like lines. This load+parse would
>>>> encompass filesystem searching plus content parsing, so all the heavy
>>>> lifting of booting a file would be pushed into the thread pool.
>>>> * Somewhere (perhaps in LoadService) we would maintain an LRU cache
>>>> mapping from file paths to ASTs. The cache would contain Futures;
>>>> getting the actual parsed library would then simply be a matter of
>>>> Future.get, allowing many of the load+parses to be done
>>>> asynchronously.
>>>>
>>>> For a system like Rails, where there might be hundreds of files
>>>> loaded, this could definitely improve startup performance.
>>>>
>>>> Thoughts?
>>>>
>>>> - Charlie
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe from this list, please visit:
>>>>
>>>>    http://xircles.codehaus.org/manage_email
>>>>
>>>>
>>>
>>
>>
>

---------------------------------------------------------------------
To unsubscribe from this list, please visit:

    http://xircles.codehaus.org/manage_email

Re: [jruby-dev] Improving load time by parallelizing load/parse?

Reply via email to