Ok let's do that (add runsequential to benchmark and all the rest). If I'll run into this elsewhere as well I will report and we can talk then about trying to find a solution for this. If it's just benchmark then I think we'll be ok.
Shai On Thursday, April 1, 2010, Robert Muir <rcm...@gmail.com> wrote: > On Thu, Apr 1, 2010 at 12:03 AM, Shai Erera <ser...@gmail.com> wrote: > > > Hi > > I'd like to summarize a discussion I had w/ Robert and Mike last night on > IRC, about the parallelism of tasks in Benchmark: > > For some reason, ever since parallel tasks were introduced, when I run 'ant > test' from the contrib/benchmark folder (or the root), the tests just hang at > some point, after WriteLineDocTaskTest finishes. What's very weird is that it > seems I'm the only one experiencing this, and so for a long time I thought > it's just a problem w/ my environment ... until yesterday when I did a fresh > checkout of trunk, to a fresh folder and project, and still the tests stuck. > > Thread dump does not show anything relevant to Lucene code, but rather to > Ant. The main thread is waiting on > org/apache/tools/ant/taskdefs/Parallel.spinThreads, another on > org/apache/tools/ant/taskdefs/Execute.waitFor and two other on > java/io/FileInputStream.read. But nothing is related to Lucene code, > directly. Also annoyingly, but conveniently for debugging that issue, it > happens very consistently on my machine - sometimes the test passes, but 90% > hangs. > Running w/ -Drunsequential=1 consistently succeeds. > > We've explored different ways to understand the cause of the problem, and > came across several improvements and a workaround, but unfortunately not to a > definite resolution: > > * As a last resort, we can add runsequential property to benchmark build.xml, > which forces Benchmark tests to run sequentially. Since that's a tiny package > which takes a few seconds to run anyway, and parallelism doesn't improve much > (it actually runs slower, when it passes, on my machine: parallel=15 sec, > seq=11 sec), this might be acceptable. > > * Moving the junit temp files (such as that flag file) created to the temp > directory each test uses. This is actually a good thing to do anyway (thanks > Robert for spotting that), because it avoids accidental commits of such files > :), as well as doesn't clutter the main environment. We've done that because > when I hit CTR:+C to stop one of the runs which hung, we received a FNFE on a > junit flag "file is being accessed by another process" (something like that), > and thought this is related to the hangs I'm seeing. Anyway, this file is > attempted access by multiple JVMs concurrently, which seems bad. > > * Explore the JUnit Formatter code under src/test, since it uses file > locking. I've disabled locks (using NoLockFactory), however the test still > hung. > > * Change common-build.xml threadsPerProcessor to '1' instead of '2'. We think > that might be a good thing to do anyway - if people run on machines with just > one CPU, threading is not expected to help much, as opposed to running on > multiple CPUs. But we don't want to enforce it on anyone, so we think to > change the default to '1', but introduce a property 'threadsPerProcessor' > which users will be able to set explicitly. > ** Surprisingly, when I set it to '1' or '10' (I run on dual-core Thinkpad > W500), the test consistently passes - it just doesn't like the value '2'. At > least it passed as long as I ran it, maybe a thread hang is lurking for me > around the corner somewhere. > > * We made sure the benchmark tests indeed read/write the test data files > from/to unique directories. But like I said - there is no hang in Lucene code > reported in the thread dump. > > It was very late last night when we stopped, and my eyes were tired, so I > didn't summarize it right away. Robert, I hope I've captured everything we > did, if not please add. > > Anyone's got any suggestions? It's unfortunate that I'm the only one running > into this problem, because whatever the suggestions are, you'll probably need > me to confirm them :). And I'm going away for 3 days (camping - no internet > ... well at least no laptop :)), so unless someone has a suggestion within > the coming few hours, we can continue that when I get back. > > Shai > > > I think you got everything. I reopened the JIRA issue too (LUCENE-1709) and > listed the things we can do for sure now, such as lowering > threadsPerProcessor (and allowing someone to use a system property to > override this) and fixing junit temp files to be in the temp directory. > Additionally I would like to fix the ant library problem as mentioned there. > it works great from the command-line but we should improve this for > IDE-users, so they do not see a compile error. > > I am personally for the idea of adding the runsequential property to > benchmark's build.xml, to force it to run serially. While I am unable to > reproduce your problem, it does not surprise me, as I had a tough time trying > to prevent benchmark tests from stepping on each others toes. > > -- > Robert Muir > rcm...@gmail.com > --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org