I do think the idea of an abstract class (or interface) SegmentWriter
is compelling.

Each DWPT would be a [single-threaded] SegmentWriter.

And then we'd make a MultiThreadedSegmentWriterWrapper (manages a
collection of SegmentWriters, deleting to them, aggregating RAM used
across all, manages picking which ones to flush, etc.).

Then, a SlicedSegmentWriter (say) would write to separate slices,
single threaded, and then you could make it multi-threaded by wrapping
w/ the above class.

Though SegmentWriter isn't a great name since it would in general
write to multiple segments.  Indexer is a little too broad though :)

Something like that maybe?

Also, allowing an app to directly control the underlying
SegmentWriters inside IndexWriter (instead of letting the
multi-threaded wrapper decide for you) is compelling for way advanced
apps, I think.  EG your app may know it's done indexing from source A
for a while, so, you should right now go and flush it (whereas the
default "flush the one using the most RAM" could leave that source
unflushed for a quite a while, tying up RAM, unless we do some kind of
LRU flushing policy or something).

Mike

On Wed, Apr 21, 2010 at 2:27 AM, Shai Erera <[email protected]> wrote:
> I'm not sure that a Parallel DW would work for PI because DW is too internal
> to IW. Currently, the approach I've been thinking about for PI is to tackle
> it from a high level, e.g. allow the application to pass a Directory, or
> even an IW instance, and PI will play the coordinator role, ensuring that
> merge of segments happens across all the slices in accordance, implementing
> two-phase operations etc. A Parallel DW then does not fit nicely w/ that
> approach (unless we want to refactor how IW works completely) because DW is
> not aware of the Directory, and if PI indeed works over IW instances, then
> each will have its own DW.
>
> So there are two basic approaches we can take for PI (following current
> architecture) - either let PI manage IW, or have PI a sort of IW itself,
> which handles events at a much lower level. While the latter is more robust
> (and based on current limitations I'm running into, might be even easier to
> do), it lacks the flexibility of allowing the app to plug any IW it wants.
> That requirement is also important, if the application wants to use PI in
> scenarios where it keeps some slices in RAM and some on disk, or it wants to
> control more closely which fields go to which slice, so that it can at some
> point in time "rebuild" a certain slice outside PI and replace the existing
> slice in PI w/ the new one ...
>
> We should probably continue the discussion on PI, so I suggest we either
> move it to another thread or on the issue directly.
>
> Mike - I agree w/ you that we should keep the life of the application
> developers easy and that having IW itself support concurrency is beneficial.
> Like I said ... it was just a thought which was aimed at keeping our life
> (Lucene developers) easier, but that probably comes second compared to
> app-devs life :). I'm not at all sure also that that would have make our
> life easier ...
>
> So I'm good if you want to drop the discussion.
>
> Shai
>
> On Tue, Apr 20, 2010 at 8:16 PM, Michael Busch <[email protected]> wrote:
>>
>> On 4/19/10 10:25 PM, Shai Erera wrote:
>>>
>>> It will definitely simplify multi-threaded handling for IW extensions
>>> like Parallel Index …
>>>
>>
>> I'm keeping Parallel indexing in mind.  After we have separate DWPT I'd
>> like to introduce parallel DWPTs, that write different slices.
>>  Synchronization should not be a big worry then, because writing is
>> single-threaded.
>>
>> We could introduce a new abstract class SegmentWriter, which DWPT would
>> implement.  An extension would be ParallelSegmentWriter, which would manage
>> multiple SegmentWriters.   Or maybe SegmentSliceWriter would be a better
>> name.
>>
>>  Michael
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [email protected]
>> For additional commands, e-mail: [email protected]
>>
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to