[ 
https://issues.apache.org/jira/browse/LUCENE-753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-753:
--------------------------------------

    Attachment: FileReadTest.java


Carrying forward from this thread:

  http://mail-archives.apache.org/mod_mbox/lucene-java-dev/200806.mbox/[EMAIL 
PROTECTED]

Jason Rutherglen <[EMAIL PROTECTED]> wrote:

{quote}
After thinking more about the pool of RandomAccessFiles I think
LUCENE-753 is the best solution.  I am not sure how much work nor if
pool of RandomAccessFiles creates more synchronization problems and if
it is only to benefit windows, does not seem worthwhile.
{quote}

It wasn't clear to me that pread would in fact perform better than
letting each thread uses its own private RandomAccessFile.

So I modified (attached) FileReadTest.java to add a new SeparateFile
implementation, which opens a private RandomAccessFile per-thread and
then just does "classic" seeks & reads on that file.  Then I ran the
test on 3 platforms (results below), using 4 threads.

The results are very interesting -- using SeparateFile is always
faster, especially so on WinXP Pro (115% faster than the next fastest,
ClassicFile) but also surprisingly so on Linux (44% faster than the
next fastest, ChannelPread).  On Mac OS X it was 5% faster than
ChannelPread.  So on all platforms it's faster, when using multiple
threads, to use separate files.

I don't have a Windows server class machine readily accessible so if
someone could run on such a machine, and run on other machines
(Solaris) to see if these results are reproducible, that'd be great.

This is a strong argument for some sort of pooling of
RandomAccessFiles under FSDirectory, though the counter balance is
clearly added complexity.  I think if we combined the two approaches
(use separate RandomAccessFile objects per thread as managed by a
pool, and then use the best mode (classic on Windows & channel pread
on all others)) we'd likely get the best performance yet.

Mac OS X 10.5.3, single WD Velociraptor hard drive, Sun JRE 1.6.0_05

{code}

config: impl=ClassicFile serial=true nThreads=4 iterations=100 bufsize=1024 
filelen=67108864
answer=-23909200, ms=151884, MB/sec=176.73715203708093

config: impl=SeparateFile serial=true nThreads=4 iterations=100 bufsize=1024 
filelen=67108864
answer=-23909200, ms=97820, MB/sec=274.4177632386015

config: impl=ChannelPread serial=true nThreads=4 iterations=100 bufsize=1024 
filelen=67108864
answer=-23909200, ms=103059, MB/sec=260.4677476008888

config: impl=ChannelFile serial=true nThreads=4 iterations=100 bufsize=1024 
filelen=67108864
answer=-23909200, ms=176250, MB/sec=152.30380482269504

config: impl=ChannelTransfer serial=true nThreads=4 iterations=100 bufsize=1024 
filelen=67108864
answer=-23909200, ms=365904, MB/sec=73.36226332589969

{code}


Linux 2.6.22.1, 6-drive RAID 5 array, Sun JRE 1.6.0_06

{code}

config: impl=ClassicFile serial=true nThreads=4 iterations=100 bufsize=1024 
filelen=67108864
answer=-23909200, ms=75592, MB/sec=355.1109323737962

config: impl=SeparateFile serial=true nThreads=4 iterations=100 bufsize=1024 
filelen=67108864
answer=-23909200, ms=35505, MB/sec=756.0497282072947

config: impl=ChannelPread serial=true nThreads=4 iterations=100 bufsize=1024 
filelen=67108864
answer=-23909200, ms=51075, MB/sec=525.5711326480665

config: impl=ChannelFile serial=true nThreads=4 iterations=100 bufsize=1024 
filelen=67108864
answer=-23909200, ms=95640, MB/sec=280.6727896277708

config: impl=ChannelTransfer serial=true nThreads=4 iterations=100 bufsize=1024 
filelen=67108864
answer=-23909200, ms=93711, MB/sec=286.45031639828835

{code}



WIN XP PRO, laptop, Sun JRE 1.4.2_15:

{code}

config: impl=ClassicFile serial=true nThreads=4 iterations=100 bufsize=1024 
filelen=67108864
answer=-23909200, ms=135349, MB/sec=198.32836297275932

config: impl=SeparateFile serial=true nThreads=4 iterations=100 bufsize=1024 
filelen=67108864
answer=-23909200, ms=62970, MB/sec=426.2910211211688

config: impl=ChannelPread serial=true nThreads=4 iterations=100 bufsize=1024 
filelen=67108864
answer=-23909200, ms=174606, MB/sec=153.73781886074937

config: impl=ChannelFile serial=true nThreads=4 iterations=100 bufsize=1024 
filelen=67108864
answer=-23909200, ms=152171, MB/sec=176.4038193873997

config: impl=ChannelTransfer serial=true nThreads=4 iterations=100 bufsize=1024 
filelen=67108864
answer=-23909200, ms=275603, MB/sec=97.39932293915524

{code}


> Use NIO positional read to avoid synchronization in FSIndexInput
> ----------------------------------------------------------------
>
>                 Key: LUCENE-753
>                 URL: https://issues.apache.org/jira/browse/LUCENE-753
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Store
>            Reporter: Yonik Seeley
>         Attachments: FileReadTest.java, FileReadTest.java, FileReadTest.java, 
> FileReadTest.java, FSIndexInput.patch, FSIndexInput.patch, lucene-753.patch
>
>
> As suggested by Doug, we could use NIO pread to avoid synchronization on the 
> underlying file.
> This could mitigate any MT performance drop caused by reducing the number of 
> files in the index format.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to