" How do other people handle scaling their index writers? It sounds like, in every case, writing is a single access thing."
We communicate updates to our IndexUpdaters via the database, and periodically service the update one record at a time. I can't remember off the top of my head, but concurrent writes to the index is a bad idea from "Lucene in Action" (I totally recommend this book for you even if you're using Lucene.Net). " My goal was going to be to scale the consumer of the write queue out across multiple machines so I could use additional processing power to update the lucene index. Sounds like this is a terrible idea. :-)" AFAIK, You'll have to serialize your writes. " I guess scaling out to multiple indexes is an option... but then how do you search across all those indexes at once?" Creating multiple indexes sounds like a nightmare because MultiIndexSearcher (*sp*) interleaves the results meaning you'll have to filter out duplicate documents - this sounds slow and cumbersome. I'm curious what others say about this. May I ask how big your compacted index is? And how big you are designing it for? Vijay Santhanam B.Eng.(Soft.) Spectrum Wired - Software Engineer -----Original Message----- From: Patrick Burrows [mailto:[EMAIL PROTECTED] Sent: Tuesday, 26 June 2007 12:45 AM To: [email protected] Subject: Re: FileNotFound Exception I don't even like the idea of keeping the IndexSearcher around. So far my app is completely stateless. I guess I will need to do that, though. My IndexReaders don't ever delete, so that should be fine. I am already queueing most write operations (except for the validations I mentioned before). I could just add those to the queue as well. Though...my queue reader is multi-threaded, so it can do multiple writes at a time. I guess I need to strip out all that code. No point making it multi-threaded if the core functionality has to be synchronized. My goal was going to be to scale the consumer of the write queue out across multiple machines so I could use additional processing power to update the lucene index. Sounds like this is a terrible idea. :-) How do other people handle scaling their index writers? It sounds like, in every case, writing is a single access thing. I guess scaling out to multiple indexes is an option... but then how do you search across all those indexes at once? On 6/25/07, Kurt Mackey <[EMAIL PROTECTED]> wrote: > > Heh, this is where Lucene gets annoying. > > Basically, if you're using an IndexReader (and by extension, and > IndexSearcher) purely for read access to the Lucene index, you don't > really have to worry about synchronization. For performance reasons, > it's best to keep a single IndexSearcher around and use it for all your > threads. > > You *do* have to synchronize write operations. So you may not have more > than one IndexWriter or IndexReader you've used for deletes open at a > time. Once you've done your writes, you reopen the IndexSearcher. > > I'm actually working on extracting a library to handle all this from an > existing app. I'll put it up for consumption once it's reasonably > functional, and hopefully it will help take some of the pain out of > using Lucene on a web application. > > -Kurt > > > -----Original Message----- > From: Patrick Burrows [mailto:[EMAIL PROTECTED] > Sent: Monday, June 25, 2007 9:11 AM > To: [email protected] > Subject: Re: FileNotFound Exception > > Yeah. It is already very much abstracted. > > So, it sounds like everyone is agreeing about the multiple access > issues. > Hm. I hadn't anticipated that at all. > > On thinking about it, I'm not too concerned about writing. I could sync > that > all up. > > I'm very concerned about actually searching, though. Only one search at > at > time? I can't see that ever working. > > > > On 6/25/07, Vijay Santhanam <[EMAIL PROTECTED]> wrote: > > > > Hi Patrick, > > > > If you intend to be dynamically updating the index (i.e. while the > > searcher > > is still alive) you can't avoid synchronization because the > IndexSearcher > > is > > unaware of new docs until it is refreshed/reinstantiated. > > > > Defining an interface is very important for compartmentalizing the > search > > engine. The clients (like your ASP.NET website) of your search engine > > should > > be hidden from the IndexSearcher, Modifier, Writer and Readers to > protect > > them from headache synching concerns. > > > > Also, I found creating wrapper objects and extending lucene classes > > allowed > > me to further exclude Lucene.Net.* classes from my search engine > > interface. > > > > Decoupling Lucene.Net and it's wrapping consumption classes > > (YourIndexBuilder, YourIndexUpdater,etc) is a good start for scaling > your > > search engine too. > > > > During the last Lucene.Net project I was involved with, we put > Lucene.Net > > inside a Windows service, and exposed it with a static singleton > remoting > > interface. > > > > Vijay Santhanam > > B.Eng.(Soft.) > > Spectrum Wired - Software Engineer > > > > T: +61 2 4925 3266 > > F: +61 2 4925 3255 > > M: +61 407 525 087 > > W: www.spectrumwired.com > > > > > > Disclaimer: This email and any attached files are intended solely for > the > > named addressee, are confidential and may contain legally privileged > > information. The copying or distribution of them or any information > they > > contain, by anyone other than the addressee, is prohibited. If you > have > > received this email in error, please let us know by telephone or > return > > the > > email to the sender and destroy all copies. Thank you. > > > > > > > > -----Original Message----- > > From: Patrick Burrows [mailto:[EMAIL PROTECTED] > > Sent: Monday, 25 June 2007 11:27 PM > > To: [email protected] > > Subject: Re: FileNotFound Exception > > > > Ooh... is that right? > > > > Cause I access it via a website without any sort of sync locking. The > site > > isn't live. But, by the very nature of a website, it is multithreaded. > > > > I also have separate processes which are constantly updating the > index. > > > > And yet another process that validates the index once a week (makes > sure > > there are no dupes or missed records). > > > > Access to the index through all these things must be synchronized? > That > > seems... cumbersome. At best. > > > > > > On 6/25/07, Torsten Rendelmann <[EMAIL PROTECTED]> wrote: > > > > > > These kind of errors we also got - the reason was: > > > We accessed the index by multiple threads. Think, the same > > > happens if you access the index by two processes as > > > it seems examining the callstack (guess). > > > > > > > > > TorstenR > > > > > > > -----Original Message----- > > > > From: Patrick Burrows [mailto:[EMAIL PROTECTED] > > > > Sent: Sunday, June 24, 2007 7:21 PM > > > > To: [email protected] > > > > Subject: Re: FileNotFound Exception > > > > > > > > I deleted and recreated my index and things seem to be > > > > indexing now just > > > > fine. I went ahead and deleted it because everything google > > > > said was "wow, > > > > that seems bad" whenever someone else got this error. > > > > > > > > On 6/24/07, Patrick Burrows <[EMAIL PROTECTED]> wrote: > > > > > > > > > > If I call .Optimize() I get the same error... > > > > > > > > > > > > > > > > > > > > On 6/24/07, Patrick Burrows <[EMAIL PROTECTED]> wrote: > > > > > > > > > > > > I am in a tight loop adding items into my index. After > > > > running for a > > > > > > couple minutes in the loop just fine, I get the error > > > > posted below. If I > > > > > > then step through (don't stop the debugger, just hit F10 > > > > to keep stepping), > > > > > > it adds just fine. If I let it run, it will get the error > > > > again immediately. > > > > > > If I keep stepping through, though, I get no error. Only > > > > when it is running > > > > > > continuously. > > > > > > > > > > > > I added a sleep statement in my attempt to "program by > > > > coincidence" but > > > > > > it had no effect. Here is the code I am executing. The > > > > error is below that. > > > > > > The error occurs on the iw.AddDocument line: > > > > > > > > > > > > > > > > > > public > > > > > > static void AddPostsToIndex( List<Post> posts) > > > > > > > > > > > > { > > > > > > > > > > > > IndexWriter iw = GetIndexWriter(); > > > > > > > > > > > > foreach (Post post in posts) > > > > > > > > > > > > { > > > > > > > > > > > > DateTime loopItemStart = DateTime.Now; > > > > > > > > > > > > iw.AddDocument(post.ToDocument()); > > > > > > > > > > > > System.Threading. > > > > > > Thread.Sleep(10); > > > > > > > > > > > > log.DebugFormat( > > > > > > "Added post for feedItem {0} in {1}", post.FeedItemId, > > > > > > > > > > > > DateTime.Now.Subtract(loopItemStart)); > > > > > > > > > > > > } > > > > > > > > > > > > iw.Close(); > > > > > > > > > > > > } > > > > > > > > > > > > System.IO.FileNotFoundException was unhandled > > > > > > Message="Could not find file > > > > > > 'C:\\FeedReader\\FullTextSearch\\_oy.fnm'." > > > > > > Source="mscorlib" > > > > > > FileName="C:\\FeedReader\\FullTextSearch\\_oy.fnm" > > > > > > StackTrace: > > > > > > at System.IO.__Error.WinIOError(Int32 errorCode, String > > > > > > maybeFullPath) > > > > > > at System.IO.FileStream.Init(String path, FileMode > mode, > > > > > > FileAccess access, Int32 rights, Boolean useRights, > > > > FileShare share, Int32 > > > > > > bufferSize, FileOptions options, SECURITY_ATTRIBUTES > > > > secAttrs, String > > > > > > msgPath, Boolean bFromProxy) > > > > > > at System.IO.FileStream..ctor(String path, FileMode > mode, > > > > > > FileAccess access, FileShare share) > > > > > > at > > > > Lucene.Net.Store.FSIndexInput.Descriptor..ctor(FSIndexInput > > > > > > enclosingInstance, FileInfo file, FileAccess mode) > > > > > > at Lucene.Net.Store.FSIndexInput..ctor(FileInfo path) > > > > > > at Lucene.Net.Store.FSDirectory.OpenInput(String name) > > > > > > at Lucene.Net.Index.FieldInfos..ctor(Directory d, > > > > String name) > > > > > > at Lucene.Net.Index.SegmentReader.Initialize > > > > (SegmentInfo si) > > > > > > at Lucene.Net.Index.SegmentReader.Get(Directory > > > > dir, SegmentInfo > > > > > > si, SegmentInfos sis, Boolean closeDir, Boolean ownDir) > > > > > > at Lucene.Net.Index.SegmentReader.Get(SegmentInfo si) > > > > > > at > > > > Lucene.Net.Index.IndexWriter.MergeSegments(Int32 minSegment, > > > > > > Int32 end) > > > > > > at > > > > Lucene.Net.Index.IndexWriter.MergeSegments(Int32 minSegment) > > > > > > at Lucene.Net.Index.IndexWriter.MaybeMergeSegments() > > > > > > at Lucene.Net.Index.IndexWriter.AddDocument(Document > doc, > > > > > > Analyzer analyzer) > > > > > > at Lucene.Net.Index.IndexWriter.AddDocument(Document > doc) > > > > > > at > FullTextSearch.Tasks.IndexManager.AddPostsToIndex(List`1 > > > > > > posts) > > > > > > at FullTextSearch.Tasks.IndexManager.ValidateIndex() > > > > > > at Indox.Program.RefreshDocsInIndex() in > > > > > > > > > > C:\Dev\WebSites\FeedReader\FullTextSearch\System\Indox\Program > > > .cs:line 61 > > > > > > at Indox.Program.HandleArguments (String[] args) in > > > > > > > > > > C:\Dev\WebSites\FeedReader\FullTextSearch\System\Indox\Program > > > .cs:line 40 > > > > > > at Indox.Program.Main(String[] args) in > > > > > > > > > > C:\Dev\WebSites\FeedReader\FullTextSearch\System\Indox\Program > > > .cs:line 23 > > > > > > at System.AppDomain.nExecuteAssembly(Assembly > > > > assembly, String[] > > > > > > args) > > > > > > at System.AppDomain.ExecuteAssembly(String > > > > assemblyFile, Evidence > > > > > > assemblySecurity, String[] args) > > > > > > at > > > > > > > Microsoft.VisualStudio.HostingProcess.HostProc.RunUsersAssembly () > > > > > > at > System.Threading.ThreadHelper.ThreadStart_Context(Object > > > > > > state) > > > > > > at > System.Threading.ExecutionContext.Run(ExecutionContext > > > > > > executionContext, ContextCallback callback, Object state) > > > > > > at System.Threading.ThreadHelper.ThreadStart () > > > > > > > > > > > > > > > > > > -- > > > > > > - > > > > > > P > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > - > > > > > P > > > > > > > > > > > > > > > > > > > > -- > > > > - > > > > P > > > > > > > > > > > > > > > > -- > > - > > P > > > > > > > > __________ NOD32 2220 (20070426) Information __________ > > > > This message was checked by NOD32 antivirus system. > > http://www.eset.com > > > > > > > -- > - > P > -- - P __________ NOD32 2220 (20070426) Information __________ This message was checked by NOD32 antivirus system. http://www.eset.com
