[ 
https://issues.apache.org/jira/browse/LUCENE-753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12614290#action_12614290
 ] 

Michael McCandless commented on LUCENE-753:
-------------------------------------------


{quote}
In our implementation the synchronization/lack of concurrency has been a big 
issue for us. On several occasions we've had to remove new features that 
perform searches from frequently hit pages, because threads build up waiting 
for synchronized access to the underlying files.  It is possible that I would 
still have issue even with my patch, considering from my tests that I'm only 
increasing throughput by 300%, but it would be easier for me to tune and scale 
my application since resource utilization and contention would be visible from 
the OS level. 
{quote}

Can you describe your test -- OS, JRE version, size/type of your index, number 
of cores, amount of RAM, type of IO system, etc?  It's awesome that you see 
300% gain in search throughput.  Is your index largely cached in the OS's IO 
cache, or not?

{quote}
My vote is that the benefits outway the complexity, especially considering it's 
an out-of-the box solutions that works well for all platforms and single 
threaded as well as multi-threaded envirnments. If it's helpful, I can spend 
the time to implement some of the missing feature(s) of the pool that will be 
needed for it to be an acceptable solution (i.e, shared access once a file has 
been deleted, and perhaps a time-based closing mechanism).
{quote}

If we can see sizable concurreny gains, reliably & across platforms, I agree we 
should pursue this approach.  One particular frustration is: if you optimize 
your index, thinking this gains you better search performance, you're actually 
making things far worse as far as concurrency is concerned because now you are 
down to a single immense file.  I think we do need to fix this situation.

On your patch, I think in addition to shared-access on a now-deleted file, we 
should add a global control on the "budget" of number of open files (right now 
I think your patch has a fixed cap per-filename).  Probably the budget should 
be expressed as a multiplier off the minimum number of open files, rather than 
a fixed cap, so that an index with many segments is allowed to use more.  
Ideally over time the pool works out such that for small files in the index 
(small segments) since there is very little contention they only hold 1 
descriptor open, but for large files many descriptors are opened.

I created a separate test (will post a patch & details to this issue) to 
explore using SeparateFile inside FSDirectory, but unfortunately I see mixed 
results on both the cached & uncached cases.  I'll post details separately.

One issue with your patch is it's using Java 5 only classes (Lucene is still on 
1.4); once you downgrade to 1.4 I wonder if the added synchronization will 
become costly.

I like how your approach is to pull a RandomAccessFile from the pool only when 
a read is taking place -- this automatically takes care of creating new 
descriptors when there truly is contention.  But one concern I have is that 
this defeats the OS's IO system's read-ahead optimization since from the OS's 
perspective the file descriptors are getting shuffled.  I'm not sure if this 
really matters much in Lucene, because many things (reading stored fields & 
term vectors) are likely not helped much by read-ahead, but for example a 
simple TermQuery on a large term should in theory benefit from read-ahead.  You 
could gain this back with a simple thread affinity, such that the same thread 
gets the same file descriptor it got last time, if it's available.  But that 
added complexity may offset any gains.


> Use NIO positional read to avoid synchronization in FSIndexInput
> ----------------------------------------------------------------
>
>                 Key: LUCENE-753
>                 URL: https://issues.apache.org/jira/browse/LUCENE-753
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Store
>            Reporter: Yonik Seeley
>         Attachments: FileReadTest.java, FileReadTest.java, FileReadTest.java, 
> FileReadTest.java, FileReadTest.java, FileReadTest.java, FileReadTest.java, 
> FSIndexInput.patch, FSIndexInput.patch, lucene-753.patch, lucene-753.patch
>
>
> As suggested by Doug, we could use NIO pread to avoid synchronization on the 
> underlying file.
> This could mitigate any MT performance drop caused by reducing the number of 
> files in the index format.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to