[ https://issues.apache.org/jira/browse/LUCENE-753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Michael McCandless updated LUCENE-753: -------------------------------------- Attachment: FileReadTest.java Carrying forward from this thread: http://mail-archives.apache.org/mod_mbox/lucene-java-dev/200806.mbox/[EMAIL PROTECTED] Jason Rutherglen <[EMAIL PROTECTED]> wrote: {quote} After thinking more about the pool of RandomAccessFiles I think LUCENE-753 is the best solution. I am not sure how much work nor if pool of RandomAccessFiles creates more synchronization problems and if it is only to benefit windows, does not seem worthwhile. {quote} It wasn't clear to me that pread would in fact perform better than letting each thread uses its own private RandomAccessFile. So I modified (attached) FileReadTest.java to add a new SeparateFile implementation, which opens a private RandomAccessFile per-thread and then just does "classic" seeks & reads on that file. Then I ran the test on 3 platforms (results below), using 4 threads. The results are very interesting -- using SeparateFile is always faster, especially so on WinXP Pro (115% faster than the next fastest, ClassicFile) but also surprisingly so on Linux (44% faster than the next fastest, ChannelPread). On Mac OS X it was 5% faster than ChannelPread. So on all platforms it's faster, when using multiple threads, to use separate files. I don't have a Windows server class machine readily accessible so if someone could run on such a machine, and run on other machines (Solaris) to see if these results are reproducible, that'd be great. This is a strong argument for some sort of pooling of RandomAccessFiles under FSDirectory, though the counter balance is clearly added complexity. I think if we combined the two approaches (use separate RandomAccessFile objects per thread as managed by a pool, and then use the best mode (classic on Windows & channel pread on all others)) we'd likely get the best performance yet. Mac OS X 10.5.3, single WD Velociraptor hard drive, Sun JRE 1.6.0_05 {code} config: impl=ClassicFile serial=true nThreads=4 iterations=100 bufsize=1024 filelen=67108864 answer=-23909200, ms=151884, MB/sec=176.73715203708093 config: impl=SeparateFile serial=true nThreads=4 iterations=100 bufsize=1024 filelen=67108864 answer=-23909200, ms=97820, MB/sec=274.4177632386015 config: impl=ChannelPread serial=true nThreads=4 iterations=100 bufsize=1024 filelen=67108864 answer=-23909200, ms=103059, MB/sec=260.4677476008888 config: impl=ChannelFile serial=true nThreads=4 iterations=100 bufsize=1024 filelen=67108864 answer=-23909200, ms=176250, MB/sec=152.30380482269504 config: impl=ChannelTransfer serial=true nThreads=4 iterations=100 bufsize=1024 filelen=67108864 answer=-23909200, ms=365904, MB/sec=73.36226332589969 {code} Linux 2.6.22.1, 6-drive RAID 5 array, Sun JRE 1.6.0_06 {code} config: impl=ClassicFile serial=true nThreads=4 iterations=100 bufsize=1024 filelen=67108864 answer=-23909200, ms=75592, MB/sec=355.1109323737962 config: impl=SeparateFile serial=true nThreads=4 iterations=100 bufsize=1024 filelen=67108864 answer=-23909200, ms=35505, MB/sec=756.0497282072947 config: impl=ChannelPread serial=true nThreads=4 iterations=100 bufsize=1024 filelen=67108864 answer=-23909200, ms=51075, MB/sec=525.5711326480665 config: impl=ChannelFile serial=true nThreads=4 iterations=100 bufsize=1024 filelen=67108864 answer=-23909200, ms=95640, MB/sec=280.6727896277708 config: impl=ChannelTransfer serial=true nThreads=4 iterations=100 bufsize=1024 filelen=67108864 answer=-23909200, ms=93711, MB/sec=286.45031639828835 {code} WIN XP PRO, laptop, Sun JRE 1.4.2_15: {code} config: impl=ClassicFile serial=true nThreads=4 iterations=100 bufsize=1024 filelen=67108864 answer=-23909200, ms=135349, MB/sec=198.32836297275932 config: impl=SeparateFile serial=true nThreads=4 iterations=100 bufsize=1024 filelen=67108864 answer=-23909200, ms=62970, MB/sec=426.2910211211688 config: impl=ChannelPread serial=true nThreads=4 iterations=100 bufsize=1024 filelen=67108864 answer=-23909200, ms=174606, MB/sec=153.73781886074937 config: impl=ChannelFile serial=true nThreads=4 iterations=100 bufsize=1024 filelen=67108864 answer=-23909200, ms=152171, MB/sec=176.4038193873997 config: impl=ChannelTransfer serial=true nThreads=4 iterations=100 bufsize=1024 filelen=67108864 answer=-23909200, ms=275603, MB/sec=97.39932293915524 {code} > Use NIO positional read to avoid synchronization in FSIndexInput > ---------------------------------------------------------------- > > Key: LUCENE-753 > URL: https://issues.apache.org/jira/browse/LUCENE-753 > Project: Lucene - Java > Issue Type: New Feature > Components: Store > Reporter: Yonik Seeley > Attachments: FileReadTest.java, FileReadTest.java, FileReadTest.java, > FileReadTest.java, FSIndexInput.patch, FSIndexInput.patch, lucene-753.patch > > > As suggested by Doug, we could use NIO pread to avoid synchronization on the > underlying file. > This could mitigate any MT performance drop caused by reducing the number of > files in the index format. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]