Ok let's do that (add runsequential to benchmark and all the rest). If
I'll run into this elsewhere as well I will report and we can talk
then about trying to find a solution for this. If it's just benchmark
then I think we'll be ok.

Shai

On Thursday, April 1, 2010, Robert Muir <rcm...@gmail.com> wrote:
> On Thu, Apr 1, 2010 at 12:03 AM, Shai Erera <ser...@gmail.com> wrote:
>
>
> Hi
>
> I'd like to summarize a discussion I had w/ Robert and Mike last night on 
> IRC, about the parallelism of tasks in Benchmark:
>
> For some reason, ever since parallel tasks were introduced, when I run 'ant 
> test' from the contrib/benchmark folder (or the root), the tests just hang at 
> some point, after WriteLineDocTaskTest finishes. What's very weird is that it 
> seems I'm the only one experiencing this, and so for a long time I thought 
> it's just a problem w/ my environment ... until yesterday when I did a fresh 
> checkout of trunk, to a fresh folder and project, and still the tests stuck.
>
> Thread dump does not show anything relevant to Lucene code, but rather to 
> Ant. The main thread is waiting on 
> org/apache/tools/ant/taskdefs/Parallel.spinThreads, another on 
> org/apache/tools/ant/taskdefs/Execute.waitFor and two other on 
> java/io/FileInputStream.read. But nothing is related to Lucene code, 
> directly. Also annoyingly, but conveniently for debugging that issue, it 
> happens very consistently on my machine - sometimes the test passes, but 90% 
> hangs.
> Running w/ -Drunsequential=1 consistently succeeds.
>
> We've explored different ways to understand the cause of the problem, and 
> came across several improvements and a workaround, but unfortunately not to a 
> definite resolution:
>
> * As a last resort, we can add runsequential property to benchmark build.xml, 
> which forces Benchmark tests to run sequentially. Since that's a tiny package 
> which takes a few seconds to run anyway, and parallelism doesn't improve much 
> (it actually runs slower, when it passes, on my machine: parallel=15 sec, 
> seq=11 sec), this might be acceptable.
>
> * Moving the junit temp files (such as that flag file) created to the temp 
> directory each test uses. This is actually a good thing to do anyway (thanks 
> Robert for spotting that), because it avoids accidental commits of such files 
> :), as well as doesn't clutter the main environment. We've done that because 
> when I hit CTR:+C to stop one of the runs which hung, we received a FNFE on a 
> junit flag "file is being accessed by another process" (something like that), 
> and thought this is related to the hangs I'm seeing. Anyway, this file is 
> attempted access by multiple JVMs concurrently, which seems bad.
>
> * Explore the JUnit Formatter code under src/test, since it uses file 
> locking. I've disabled locks (using NoLockFactory), however the test still 
> hung.
>
> * Change common-build.xml threadsPerProcessor to '1' instead of '2'. We think 
> that might be a good thing to do anyway - if people run on machines with just 
> one CPU, threading is not expected to help much, as opposed to running on 
> multiple CPUs. But we don't want to enforce it on anyone, so we think to 
> change the default to '1', but introduce a property 'threadsPerProcessor' 
> which users will be able to set explicitly.
> ** Surprisingly, when I set it to '1' or '10' (I run on dual-core Thinkpad 
> W500), the test consistently passes - it just doesn't like the value '2'. At 
> least it passed as long as I ran it, maybe a thread hang is lurking for me 
> around the corner somewhere.
>
> * We made sure the benchmark tests indeed read/write the test data files 
> from/to unique directories. But like I said - there is no hang in Lucene code 
> reported in the thread dump.
>
> It was very late last night when we stopped, and my eyes were tired, so I 
> didn't summarize it right away. Robert, I hope I've captured everything we 
> did, if not please add.
>
> Anyone's got any suggestions? It's unfortunate that I'm the only one running 
> into this problem, because whatever the suggestions are, you'll probably need 
> me to confirm them :). And I'm going away for 3 days (camping - no internet 
> ... well at least no laptop :)), so unless someone has a suggestion within 
> the coming few hours, we can continue that when I get back.
>
> Shai
>
>
> I think you got everything. I reopened the JIRA issue too (LUCENE-1709) and 
> listed the things we can do for sure now, such as lowering 
> threadsPerProcessor (and allowing someone to use a system property to 
> override this) and fixing junit temp files to be in the temp directory. 
> Additionally I would like to fix the ant library problem as mentioned there. 
> it works great from the command-line but we should improve this for 
> IDE-users, so they do not see a compile error.
>
> I am personally for the idea of adding the runsequential property to 
> benchmark's build.xml, to force it to run serially. While I am unable to 
> reproduce your problem, it does not surprise me, as I had a tough time trying 
> to prevent benchmark tests from stepping on each others toes.
>
> --
> Robert Muir
> rcm...@gmail.com
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Reply via email to