Roman Rokytskyy wrote: >>Yes, I forgot about that one. It's even more interesting than that! The >>stream objects that Doug coded are not java.io. streams. They are >>wrappers on top of those. Each clone maintains it's own seek offset. >>Essentially, they share the same OS file handle but present an >>abstraction of multiple independent streams into the same file. >> > >Sorry, but isn't file handle sharing something specific to FSInputStream? >Why do we force that on our abstract class level? > I'm sorry, I should have been more specific. The file handle is only in the picture when FSInputStream is cloned. From what I can tell after a quick look, InputStream is responsible for buffering and it delegates to subclasses (via a call to readInternal) to refill the buffer from the underlying data store. When cloned, the InputStream clones the buffer (in the hope that the next read will still hit the buffered data I suppose), but after that it has its own seek position and its own buffer. In the case of FSInputStream, there is a Descriptor object that is shared between the clones. In the case of RAMInputStream - RAMFile is the shared object.
> > >I would suggest a factory pattern, where input stream is created for a file, >and how this is handled is up to the implementation. FSDirectory will share >handles, RAMDirectory will have references to same RAMFile object, my >JDataStoreDirectory will rely on JDataStore to manage it effectively. > Perhaps a factory patter would be more flexible, but it looks like the existing code does a pretty good job for the RAM and FS cases. Would the factory pattern allow a better database implementation? > > >Should I try to rewrite it? (I also would appreciate your opinion if I >should try to touch that code at all). > I don't know, I have not heard many complaints about that code recently. There is activity in terms of creating a crawler / content handler framework. There is also a need to handle "update" better, I think. For example, I think it would be great to have deletes go through IndexWriter and get "cached" in the new segment, to be later applied to the prior segments during optimization. This would make deletes and adds transactional. Another thing on my wish / todo list is to reduce the number of OS files that must be open. Once you get a lot of indexes, with a number of stored fields, and keep re-indexing them, the number of open files grows rather quickly. And if Lucene is part of another program that already has other file IO needs, you end up quickly pushing into the max open files limit of the OS. The idea I have for this one is to implement a different kind of segment - one that is composed of a single file. Once a segment is created by IndexWriter, it never changes (besides the deletes), so it could easily be stored as a single file. These are just a few areas that are my favorites... But then again, if you see another problem that's in your way, chances are that there are other people out there with the same issue. In any case, good luck! Dmitry. > > >Thanks, >Roman Rokytskyy > > >_________________________________________________________ >Do You Yahoo!? >Get your free @yahoo.com address at http://mail.yahoo.com > > >-- >To unsubscribe, e-mail: <mailto:[EMAIL PROTECTED]> >For additional commands, e-mail: <mailto:[EMAIL PROTECTED]> > >. > -- To unsubscribe, e-mail: <mailto:[EMAIL PROTECTED]> For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>