Re: [jruby-dev] Improving load time by parallelizing load/parse?

Charles Oliver Nutter Tue, 25 Oct 2011 08:24:32 -0700

2011/10/24 Hiroshi Nakamura <nakah...@gmail.com>:
>> At a naive level, parallelizing the parse of an individual file is
>> tricky to impossible; the parser state is very much straight-line.
>> But perhaps it's possible to parallelize loading of many files?
>>
>> I started playing with parallelizing calls to the parser, but that
>> doesn't really help anything; every call to the parser blocks
>> waiting for it to complete, and the contents are not interpreted
>> until after that point. That means that "require" lines remain
>> totally opaque, preventing us from proactively starting threaded
>> parses of additional files. But there lies the opportunity: what if
>> load/require requests were done as Futures, require/load lines were
>> eagerly interpreted by submitting load/require requests to a thread
>> pool, and child requires could be loading and parsing at the same
>> time as the parent file...without conflicting.
>
> I might misunderstanding something.  But you're discussing only 'load'
> (read in source as a stream) and 'parse' (create AST), not evaluating
> it, right? Or does it include evaluation phase?


It does not include evaluation phase. I separated the evaluation from
the load+parse logic in LoadService and allowed the parser to trigger
load+parse jobs directly (without evaluating the file).

Good news: I managed to get it working! Bad news: it did not improve
startup time as much as I hoped. Either startup time is not as heavily
dependent on load+parse as I believed, or the overhead of submitting
little load+parse jobs to a thread pool made it about the same.

The code is pushed to the parallel_load branch on
github.com/headius/jruby and works like this:

* As files are parsed, calls to "require" that have a single string
argument are submitted to LoadService as load+parse jobs.
* load+parse jobs are held in a map from the require string to a
Future<SearchState>.
* As actual requires come in, they first look in the map for the
string they're attempting to require.
** If it exists in the map, they simply call #get on the future and
wait for it to complete. Then they evaluate the file, which triggers
other require calls, which find other Futures, etc.
** If it does not exist in the map, it simply does a normal
synchronous require. That required file may trigger asynchronous
load+parse jobs, though.

And it works...I was able to install gems and run basic Rails
operations. But the best improvement (testing jruby -e "require
'config/application.rb'" only improved by about 10-15%, and running
"rake test" in an empty Rails app improved from around 13s to around
12s. Not a lot.

Other findings and thoughts:

* An initial patch that only parallelized load path searching did not
appear to improve perf in any case, or it was so small I could not see
it.
* It may not be safe to parallelize load path searching, since
evaluated code can modify load path. I think it's still possible to
parallelize parsing alone if the LoadService Future map keys off the
canonical location of the file. If load path changes, then, it won't
matter.

If anyone wants to play with my implementation and see if they can
make it faster, go right ahead. I thought it was pretty promising.

- Charlie

---------------------------------------------------------------------
To unsubscribe from this list, please visit:

    http://xircles.codehaus.org/manage_email

Re: [jruby-dev] Improving load time by parallelizing load/parse?

Reply via email to