[ https://issues.apache.org/jira/browse/LUCENE-2143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Michael McCandless updated LUCENE-2143: --------------------------------------- Attachment: SearchTest.java So, good news / bad news... The good news is I got a more mainstream test env (CentOS 5.4) online. The bad news is the strange anomolies when testing NRT still occur, and, flushing every 100 docs does not work around them. But then the good news is, I managed to isolate the problem to the hotspot compiler: somehow, it consistently compiles Lucene's search code less efficiently (20-30% slower) depending on which test is being run, which basically makes it impossible to really test performance tradeoffs of NRT. I've attached a simple SearchTest.java that shows the hotspot issue. Run it like this: {code} java SearchTest /path/to/index <warmMethod> {code} I'm testing against a 5M doc Wikipedia index. The <warmMethod> can be: * "writer": open a writer, indexes docs, then rollback * "nrt": same as "writer", but periodically get an NRT reader * "reader": just open an IndexReader N times, then close it * "searcher": same as "reader", but do some searching against each opened reader * "none": do no warming After the warming, the test just runs a set of searches (TermQuery for terms 0, 1, 2 ... 9) 10 times, then prints the min time. I ran the tests on a 5M docs wikipedia index. On nearly all JREs version I've tested, on OpenSolaris 2009.06 & CentOS 5.4, warming with NRT causes a "permanent" loss of search performance of somewhere between 20-30%. EG here's my results on OpenSolaris: {code} nrt... 5718 msec searcher... 4664 msec reader... 4771 msec writer... 4785 msec none... 4839 msec {code} On CentOS: {code} nrt... 4550 msec searcher... 3760 msec reader... 4730 msec writer... 3780 msec none... 3766 msec {code} (In this case the "reader" warming also kicked hotspot into the slow mode... it seems to be intermittant because sometimes "reader" is fast) I run java as "java -server -Xms1g -Xmx1g" It's very odd... I suspect something buggy in hotspot, but I'm not sure how to isolate it. It seems to somehow kick itself into a state where it produces less optimal code for searching. And we're not talking about that many methods, on the hotspots for running TermQuery... I even printed out the assembly code (-XX:+PrintOptoAssembly) and it was very strange -- eg even IndexInput.readVInt was compiled differently, if you warmed with "nrt" vs the others. I don't get it. I'm trying to find a workaround that makes hotspot more manageable so I can run real tests.... > Understand why NRT performance is affected by flush frequency > ------------------------------------------------------------- > > Key: LUCENE-2143 > URL: https://issues.apache.org/jira/browse/LUCENE-2143 > Project: Lucene - Java > Issue Type: Bug > Components: Index > Reporter: Michael McCandless > Assignee: Michael McCandless > Fix For: 3.1 > > Attachments: SearchTest.java > > > In LUCENE-2061 (perf tests for NRT), I test NRT performance by first > getting a baseline QPS with only searching, using enough threads to > saturate. > Then, I pick an indexing rate (I used 100 docs/sec), and index docs at > that rate, and I also reopen a NRT reader at different frequencies > (10/sec, 1/sec, every 5 seconds, etc.), and then again test QPS > (saturated). > I think this is a good approach for testing NRT -- apps can see, as a > function of "freshness" and at a fixed indexing rate, what the cost is > to QPS. You'd expect as index rate goes up, and freshness goes up, > QPS will go down. > But I found something very strange: the low frequency reopen rates > often caused a highish hit to QPS. When I forced IW to flush every > 100 docs (= once per second), the performance was generally much > better. > I actually would've expected the reverse -- flushing in batch ought to > use fewer resoruces. > One theory is something odd about my test env (based on OpenSolaris), > so I'd like to retest on a more mainstream OS. > I'm opening this issue to get to the bottom of it... -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org