On Thu, Apr 1, 2010 at 12:03 AM, Shai Erera <ser...@gmail.com> wrote:

> Hi
>
> I'd like to summarize a discussion I had w/ Robert and Mike last night on
> IRC, about the parallelism of tasks in Benchmark:
>
> For some reason, ever since parallel tasks were introduced, when I run 'ant
> test' from the contrib/benchmark folder (or the root), the tests just hang
> at some point, after WriteLineDocTaskTest finishes. What's very weird is
> that it seems I'm the only one experiencing this, and so for a long time I
> thought it's just a problem w/ my environment ... until yesterday when I did
> a fresh checkout of trunk, to a fresh folder and project, and still the
> tests stuck.
>
> Thread dump does not show anything relevant to Lucene code, but rather to
> Ant. The main thread is waiting on
> org/apache/tools/ant/taskdefs/Parallel.spinThreads, another on
> org/apache/tools/ant/taskdefs/Execute.waitFor and two other on
> java/io/FileInputStream.read. But nothing is related to Lucene code,
> directly. Also annoyingly, but conveniently for debugging that issue, it
> happens very consistently on my machine - sometimes the test passes, but 90%
> hangs.
> Running w/ -Drunsequential=1 consistently succeeds.
>
> We've explored different ways to understand the cause of the problem, and
> came across several improvements and a workaround, but unfortunately not to
> a definite resolution:
>
> * As a last resort, we can add runsequential property to benchmark
> build.xml, which forces Benchmark tests to run sequentially. Since that's a
> tiny package which takes a few seconds to run anyway, and parallelism
> doesn't improve much (it actually runs slower, when it passes, on my
> machine: parallel=15 sec, seq=11 sec), this might be acceptable.
>
> * Moving the junit temp files (such as that flag file) created to the temp
> directory each test uses. This is actually a good thing to do anyway (thanks
> Robert for spotting that), because it avoids accidental commits of such
> files :), as well as doesn't clutter the main environment. We've done that
> because when I hit CTR:+C to stop one of the runs which hung, we received a
> FNFE on a junit flag "file is being accessed by another process" (something
> like that), and thought this is related to the hangs I'm seeing. Anyway,
> this file is attempted access by multiple JVMs concurrently, which seems
> bad.
>
> * Explore the JUnit Formatter code under src/test, since it uses file
> locking. I've disabled locks (using NoLockFactory), however the test still
> hung.
>
> * Change common-build.xml threadsPerProcessor to '1' instead of '2'. We
> think that might be a good thing to do anyway - if people run on machines
> with just one CPU, threading is not expected to help much, as opposed to
> running on multiple CPUs. But we don't want to enforce it on anyone, so we
> think to change the default to '1', but introduce a property
> 'threadsPerProcessor' which users will be able to set explicitly.
> ** Surprisingly, when I set it to '1' or '10' (I run on dual-core Thinkpad
> W500), the test consistently passes - it just doesn't like the value '2'. At
> least it passed as long as I ran it, maybe a thread hang is lurking for me
> around the corner somewhere.
>
> * We made sure the benchmark tests indeed read/write the test data files
> from/to unique directories. But like I said - there is no hang in Lucene
> code reported in the thread dump.
>
> It was very late last night when we stopped, and my eyes were tired, so I
> didn't summarize it right away. Robert, I hope I've captured everything we
> did, if not please add.
>
> Anyone's got any suggestions? It's unfortunate that I'm the only one
> running into this problem, because whatever the suggestions are, you'll
> probably need me to confirm them :). And I'm going away for 3 days (camping
> - no internet ... well at least no laptop :)), so unless someone has a
> suggestion within the coming few hours, we can continue that when I get
> back.
>
> Shai
>

I think you got everything. I reopened the JIRA issue too (LUCENE-1709) and
listed the things we can do for sure now, such as lowering
threadsPerProcessor (and allowing someone to use a system property to
override this) and fixing junit temp files to be in the temp directory.
Additionally I would like to fix the ant library problem as mentioned there.
it works great from the command-line but we should improve this for
IDE-users, so they do not see a compile error.

I am personally for the idea of adding the runsequential property to
benchmark's build.xml, to force it to run serially. While I am unable to
reproduce your problem, it does not surprise me, as I had a tough time
trying to prevent benchmark tests from stepping on each others toes.


-- 
Robert Muir
rcm...@gmail.com

Reply via email to