" How do other people handle scaling their index writers? It sounds like,
in
every case, writing is a single access thing."
We communicate updates to our IndexUpdaters via the database, and
periodically service the update one record at a time.
I can't remember off the top of my head, but concurrent writes to the
index
is a bad idea from "Lucene in Action" (I totally recommend this book for
you
even if you're using Lucene.Net).
" My goal was going to be to scale the consumer of the write queue out
across
multiple machines so I could use additional processing power to update the
lucene index. Sounds like this is a terrible idea. :-)"
AFAIK, You'll have to serialize your writes.
" I guess scaling out to multiple indexes is an option... but then how do
you
search across all those indexes at once?"
Creating multiple indexes sounds like a nightmare because
MultiIndexSearcher
(*sp*) interleaves the results meaning you'll have to filter out duplicate
documents - this sounds slow and cumbersome. I'm curious what others say
about this.
May I ask how big your compacted index is? And how big you are designing
it
for?
Vijay Santhanam
B.Eng.(Soft.)
Spectrum Wired - Software Engineer
-----Original Message-----
From: Patrick Burrows [mailto:[EMAIL PROTECTED]
Sent: Tuesday, 26 June 2007 12:45 AM
To: [email protected]
Subject: Re: FileNotFound Exception
I don't even like the idea of keeping the IndexSearcher around. So far my
app is completely stateless. I guess I will need to do that, though.
My IndexReaders don't ever delete, so that should be fine. I am already
queueing most write operations (except for the validations I mentioned
before). I could just add those to the queue as well. Though...my queue
reader is multi-threaded, so it can do multiple writes at a time. I guess
I
need to strip out all that code. No point making it multi-threaded if the
core functionality has to be synchronized.
My goal was going to be to scale the consumer of the write queue out
across
multiple machines so I could use additional processing power to update the
lucene index. Sounds like this is a terrible idea. :-)
How do other people handle scaling their index writers? It sounds like, in
every case, writing is a single access thing.
I guess scaling out to multiple indexes is an option... but then how do
you
search across all those indexes at once?
On 6/25/07, Kurt Mackey <[EMAIL PROTECTED]> wrote:
>
> Heh, this is where Lucene gets annoying.
>
> Basically, if you're using an IndexReader (and by extension, and
> IndexSearcher) purely for read access to the Lucene index, you don't
> really have to worry about synchronization. For performance reasons,
> it's best to keep a single IndexSearcher around and use it for all your
> threads.
>
> You *do* have to synchronize write operations. So you may not have more
> than one IndexWriter or IndexReader you've used for deletes open at a
> time. Once you've done your writes, you reopen the IndexSearcher.
>
> I'm actually working on extracting a library to handle all this from an
> existing app. I'll put it up for consumption once it's reasonably
> functional, and hopefully it will help take some of the pain out of
> using Lucene on a web application.
>
> -Kurt
>
>
> -----Original Message-----
> From: Patrick Burrows [mailto:[EMAIL PROTECTED]
> Sent: Monday, June 25, 2007 9:11 AM
> To: [email protected]
> Subject: Re: FileNotFound Exception
>
> Yeah. It is already very much abstracted.
>
> So, it sounds like everyone is agreeing about the multiple access
> issues.
> Hm. I hadn't anticipated that at all.
>
> On thinking about it, I'm not too concerned about writing. I could sync
> that
> all up.
>
> I'm very concerned about actually searching, though. Only one search at
> at
> time? I can't see that ever working.
>
>
>
> On 6/25/07, Vijay Santhanam <[EMAIL PROTECTED]> wrote:
> >
> > Hi Patrick,
> >
> > If you intend to be dynamically updating the index (i.e. while the
> > searcher
> > is still alive) you can't avoid synchronization because the
> IndexSearcher
> > is
> > unaware of new docs until it is refreshed/reinstantiated.
> >
> > Defining an interface is very important for compartmentalizing the
> search
> > engine. The clients (like your ASP.NET website) of your search engine
> > should
> > be hidden from the IndexSearcher, Modifier, Writer and Readers to
> protect
> > them from headache synching concerns.
> >
> > Also, I found creating wrapper objects and extending lucene classes
> > allowed
> > me to further exclude Lucene.Net.* classes from my search engine
> > interface.
> >
> > Decoupling Lucene.Net and it's wrapping consumption classes
> > (YourIndexBuilder, YourIndexUpdater,etc) is a good start for scaling
> your
> > search engine too.
> >
> > During the last Lucene.Net project I was involved with, we put
> Lucene.Net
> > inside a Windows service, and exposed it with a static singleton
> remoting
> > interface.
> >
> > Vijay Santhanam
> > B.Eng.(Soft.)
> > Spectrum Wired - Software Engineer
> >
> > T: +61 2 4925 3266
> > F: +61 2 4925 3255
> > M: +61 407 525 087
> > W: www.spectrumwired.com
> >
> >
> > Disclaimer: This email and any attached files are intended solely for
> the
> > named addressee, are confidential and may contain legally privileged
> > information. The copying or distribution of them or any information
> they
> > contain, by anyone other than the addressee, is prohibited. If you
> have
> > received this email in error, please let us know by telephone or
> return
> > the
> > email to the sender and destroy all copies. Thank you.
> >
> >
> >
> > -----Original Message-----
> > From: Patrick Burrows [mailto:[EMAIL PROTECTED]
> > Sent: Monday, 25 June 2007 11:27 PM
> > To: [email protected]
> > Subject: Re: FileNotFound Exception
> >
> > Ooh... is that right?
> >
> > Cause I access it via a website without any sort of sync locking. The
> site
> > isn't live. But, by the very nature of a website, it is multithreaded.
> >
> > I also have separate processes which are constantly updating the
> index.
> >
> > And yet another process that validates the index once a week (makes
> sure
> > there are no dupes or missed records).
> >
> > Access to the index through all these things must be synchronized?
> That
> > seems... cumbersome. At best.
> >
> >
> > On 6/25/07, Torsten Rendelmann <[EMAIL PROTECTED]> wrote:
> > >
> > > These kind of errors we also got - the reason was:
> > > We accessed the index by multiple threads. Think, the same
> > > happens if you access the index by two processes as
> > > it seems examining the callstack (guess).
> > >
> > >
> > > TorstenR
> > >
> > > > -----Original Message-----
> > > > From: Patrick Burrows [mailto:[EMAIL PROTECTED]
> > > > Sent: Sunday, June 24, 2007 7:21 PM
> > > > To: [email protected]
> > > > Subject: Re: FileNotFound Exception
> > > >
> > > > I deleted and recreated my index and things seem to be
> > > > indexing now just
> > > > fine. I went ahead and deleted it because everything google
> > > > said was "wow,
> > > > that seems bad" whenever someone else got this error.
> > > >
> > > > On 6/24/07, Patrick Burrows <[EMAIL PROTECTED]> wrote:
> > > > >
> > > > > If I call .Optimize() I get the same error...
> > > > >
> > > > >
> > > > >
> > > > > On 6/24/07, Patrick Burrows <[EMAIL PROTECTED]> wrote:
> > > > > >
> > > > > > I am in a tight loop adding items into my index. After
> > > > running for a
> > > > > > couple minutes in the loop just fine, I get the error
> > > > posted below. If I
> > > > > > then step through (don't stop the debugger, just hit F10
> > > > to keep stepping),
> > > > > > it adds just fine. If I let it run, it will get the error
> > > > again immediately.
> > > > > > If I keep stepping through, though, I get no error. Only
> > > > when it is running
> > > > > > continuously.
> > > > > >
> > > > > > I added a sleep statement in my attempt to "program by
> > > > coincidence" but
> > > > > > it had no effect. Here is the code I am executing. The
> > > > error is below that.
> > > > > > The error occurs on the iw.AddDocument line:
> > > > > >
> > > > > >
> > > > > > public
> > > > > > static void AddPostsToIndex( List<Post> posts)
> > > > > >
> > > > > > {
> > > > > >
> > > > > > IndexWriter iw = GetIndexWriter();
> > > > > >
> > > > > > foreach (Post post in posts)
> > > > > >
> > > > > > {
> > > > > >
> > > > > > DateTime loopItemStart = DateTime.Now;
> > > > > >
> > > > > > iw.AddDocument(post.ToDocument());
> > > > > >
> > > > > > System.Threading.
> > > > > > Thread.Sleep(10);
> > > > > >
> > > > > > log.DebugFormat(
> > > > > > "Added post for feedItem {0} in {1}", post.FeedItemId,
> > > > > >
> > > > > > DateTime.Now.Subtract(loopItemStart));
> > > > > >
> > > > > > }
> > > > > >
> > > > > > iw.Close();
> > > > > >
> > > > > > }
> > > > > >
> > > > > > System.IO.FileNotFoundException was unhandled
> > > > > > Message="Could not find file
> > > > > > 'C:\\FeedReader\\FullTextSearch\\_oy.fnm'."
> > > > > > Source="mscorlib"
> > > > > > FileName="C:\\FeedReader\\FullTextSearch\\_oy.fnm"
> > > > > > StackTrace:
> > > > > > at System.IO.__Error.WinIOError(Int32 errorCode, String
> > > > > > maybeFullPath)
> > > > > > at System.IO.FileStream.Init(String path, FileMode
> mode,
> > > > > > FileAccess access, Int32 rights, Boolean useRights,
> > > > FileShare share, Int32
> > > > > > bufferSize, FileOptions options, SECURITY_ATTRIBUTES
> > > > secAttrs, String
> > > > > > msgPath, Boolean bFromProxy)
> > > > > > at System.IO.FileStream..ctor(String path, FileMode
> mode,
> > > > > > FileAccess access, FileShare share)
> > > > > > at
> > > > Lucene.Net.Store.FSIndexInput.Descriptor..ctor(FSIndexInput
> > > > > > enclosingInstance, FileInfo file, FileAccess mode)
> > > > > > at Lucene.Net.Store.FSIndexInput..ctor(FileInfo path)
> > > > > > at Lucene.Net.Store.FSDirectory.OpenInput(String name)
> > > > > > at Lucene.Net.Index.FieldInfos..ctor(Directory d,
> > > > String name)
> > > > > > at Lucene.Net.Index.SegmentReader.Initialize
> > > > (SegmentInfo si)
> > > > > > at Lucene.Net.Index.SegmentReader.Get(Directory
> > > > dir, SegmentInfo
> > > > > > si, SegmentInfos sis, Boolean closeDir, Boolean ownDir)
> > > > > > at Lucene.Net.Index.SegmentReader.Get(SegmentInfo si)
> > > > > > at
> > > > Lucene.Net.Index.IndexWriter.MergeSegments(Int32 minSegment,
> > > > > > Int32 end)
> > > > > > at
> > > > Lucene.Net.Index.IndexWriter.MergeSegments(Int32 minSegment)
> > > > > > at Lucene.Net.Index.IndexWriter.MaybeMergeSegments()
> > > > > > at Lucene.Net.Index.IndexWriter.AddDocument(Document
> doc,
> > > > > > Analyzer analyzer)
> > > > > > at Lucene.Net.Index.IndexWriter.AddDocument(Document
> doc)
> > > > > > at
> FullTextSearch.Tasks.IndexManager.AddPostsToIndex(List`1
> > > > > > posts)
> > > > > > at FullTextSearch.Tasks.IndexManager.ValidateIndex()
> > > > > > at Indox.Program.RefreshDocsInIndex() in
> > > > > >
> > > > C:\Dev\WebSites\FeedReader\FullTextSearch\System\Indox\Program
> > > .cs:line 61
> > > > > > at Indox.Program.HandleArguments (String[] args) in
> > > > > >
> > > > C:\Dev\WebSites\FeedReader\FullTextSearch\System\Indox\Program
> > > .cs:line 40
> > > > > > at Indox.Program.Main(String[] args) in
> > > > > >
> > > > C:\Dev\WebSites\FeedReader\FullTextSearch\System\Indox\Program
> > > .cs:line 23
> > > > > > at System.AppDomain.nExecuteAssembly(Assembly
> > > > assembly, String[]
> > > > > > args)
> > > > > > at System.AppDomain.ExecuteAssembly(String
> > > > assemblyFile, Evidence
> > > > > > assemblySecurity, String[] args)
> > > > > > at
> > > > > >
> Microsoft.VisualStudio.HostingProcess.HostProc.RunUsersAssembly ()
> > > > > > at
> System.Threading.ThreadHelper.ThreadStart_Context(Object
> > > > > > state)
> > > > > > at
> System.Threading.ExecutionContext.Run(ExecutionContext
> > > > > > executionContext, ContextCallback callback, Object state)
> > > > > > at System.Threading.ThreadHelper.ThreadStart ()
> > > > > >
> > > > > >
> > > > > > --
> > > > > > -
> > > > > > P
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > -
> > > > > P
> > > >
> > > >
> > > >
> > > >
> > > > --
> > > > -
> > > > P
> > > >
> > >
> > >
> >
> >
> > --
> > -
> > P
> >
> >
> >
> > __________ NOD32 2220 (20070426) Information __________
> >
> > This message was checked by NOD32 antivirus system.
> > http://www.eset.com
> >
> >
>
>
> --
> -
> P
>
--
-
P
__________ NOD32 2220 (20070426) Information __________
This message was checked by NOD32 antivirus system.
http://www.eset.com