Indexwriter can't add the 10000th document to the index
I finally rerun the program and it stops at exactly the sampe place.This time the exception came out.Writer cant add the 1th document to the index... Indexing C:\sweetpea\wikipedia_xmlfiles\part-18\491886.xml Indexing C:\sweetpea\wikipedia_xmlfiles\part-18\491887.xml Indexing C:\sweetpea\wikipedia_xmlfiles\part-18\491891.xml Indexing C:\sweetpea\wikipedia_xmlfiles\part-18\491893.xml Indexing C:\sweetpea\wikipedia_xmlfiles\part-18\491896.xml java.lang.Exception: cannot add document to index And this is the code.. public static void addDocToIndex(Document doc) throws Exception { try { writer.addDocument(doc); counter++; } catch (Exception e) { throw new Exception(cannot add document to index); } } I already put -Xmx512m in Java VM argument,since previously it has exception of Exception in thread main java.lang.OutOfMemoryError: Java heap space Maureen Chris Hostetter [EMAIL PROTECTED] wrote: did you try triggering a thread dump to see what it was doing at that point? depending on your merge factors and other IndexWriter settings it could just be doing a relaly big merge. : Date: Sat, 27 Jan 2007 09:40:47 -0800 (PST) : From: maureen tanuwidjaja : Reply-To: java-user@lucene.apache.org : To: java-user@lucene.apache.org : Subject: My program stops indexing after 1th documents is indexed : : Hi all, : : Is there any limitation of number of file that lucene can handle? : I indexed a total of 3 XML Documents,however it stops at 1th documents. : No warning,no error ,no exception as well. : : Indexing C:\sweetpea\wikipedia_xmlfiles\part-18\491876.xml : Indexing C:\sweetpea\wikipedia_xmlfiles\part-18\491886.xml : Indexing C:\sweetpea\wikipedia_xmlfiles\part-18\491887.xml : Indexing C:\sweetpea\wikipedia_xmlfiles\part-18\491891.xml : Indexing C:\sweetpea\wikipedia_xmlfiles\part-18\491893.xml : Indexing C:\sweetpea\wikipedia_xmlfiles\part-18\491896.xml --1th doc : --it idles here-- : : At first I thought that it was the size of 1th document is so big so that it took quite a long time to put into the index.Then i found out that the 1th document has the size of 6 KB only.Indexing process stops for about 1 hour,so that i decide to terminate the progress. : : Is there anything to do with smt like setCompoundFiles etc?cause I dont include any in my program... : : Any suggestion pls? : : THanks and Best Regards, : : Maureen : : : - : Any questions? Get answers on any topic at Yahoo! Answers. Try it now. -Hoss - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - Cheap Talk? Check out Yahoo! Messenger's low PC-to-Phone call rates. - Expecting? Get great news right away with email Auto-Check. Try the Yahoo! Mail Beta.
search on colon : ending words
Is there a simple way to turn off field-search syntax in the Lucene parser, and have Lucene recognize words ending in a colon : as search terms instead? Such words are very common occurrences for our documents (or any plain text), but Lucene does not seem to find them. :-( Thank you, Felix
Re: search on colon : ending words
I've got to ask why you'd want to search on colons. Why not just index the words without colons and search without them too? Let's say you index the word work: Do you really want to have a search on work fail? By and large, you're better off indexing and searching without punctuation Best Erick On 1/28/07, Felix Litman [EMAIL PROTECTED] wrote: Is there a simple way to turn off field-search syntax in the Lucene parser, and have Lucene recognize words ending in a colon : as search terms instead? Such words are very common occurrences for our documents (or any plain text), but Lucene does not seem to find them. :-( Thank you, Felix
Re: My program stops indexing after 10000th documents is indexed
Maureen: I lost the e-mail where you re-throw the exception. But you'd get a *lot* more information if you'd print the stacktrace via (catch Exception e) { e.printStackTrace(); throw e; } And that would allow the folks who understand Lucene to give you a LOT more help G... Best Erick On 1/27/07, Chris Hostetter [EMAIL PROTECTED] wrote: did you try triggering a thread dump to see what it was doing at that point? depending on your merge factors and other IndexWriter settings it could just be doing a relaly big merge. : Date: Sat, 27 Jan 2007 09:40:47 -0800 (PST) : From: maureen tanuwidjaja [EMAIL PROTECTED] : Reply-To: java-user@lucene.apache.org : To: java-user@lucene.apache.org : Subject: My program stops indexing after 1th documents is indexed : : Hi all, : : Is there any limitation of number of file that lucene can handle? : I indexed a total of 3 XML Documents,however it stops at 1th documents. : No warning,no error ,no exception as well. : : Indexing C:\sweetpea\wikipedia_xmlfiles\part-18\491876.xml : Indexing C:\sweetpea\wikipedia_xmlfiles\part-18\491886.xml : Indexing C:\sweetpea\wikipedia_xmlfiles\part-18\491887.xml : Indexing C:\sweetpea\wikipedia_xmlfiles\part-18\491891.xml : Indexing C:\sweetpea\wikipedia_xmlfiles\part-18\491893.xml : Indexing C:\sweetpea\wikipedia_xmlfiles\part-18\491896.xml --1th doc : --it idles here-- : : At first I thought that it was the size of 1th document is so big so that it took quite a long time to put into the index.Then i found out that the 1th document has the size of 6 KB only.Indexing process stops for about 1 hour,so that i decide to terminate the progress. : : Is there anything to do with smt like setCompoundFiles etc?cause I dont include any in my program... : : Any suggestion pls? : : THanks and Best Regards, : : Maureen : : : - : Any questions? Get answers on any topic at Yahoo! Answers. Try it now. -Hoss - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Sorry, it is the 190,000th documents
Hi... I'm sorry,I just found out and realize that it is NOT the 10,000th documents that raise the exception when IndexWriter.add(Document) is calledbut it is the 180,000+ 10,000 document,so the 190,000th documents. Now I am running the program again and put the code to print the stacktrace if exception happens.(thanks for the advice Erick) OK.Basically what I am going to index is XML documents that consist of 22 folders where each folder contains 30,000 XML Documents.Hence Total is 660,000 XML Documents...I was reading the Lucene book and spot about the mergeFactor.I would like to know wheter the mergeFactor plays important part in indexing these files... and perhaps that this one that has a strong correlation regarding exception? I run my program using the default Value of mergeFactor,which is 10 In case needed The PC used has the following spec: Intel Pentium 4, 2.40 GHz CPU, 512 MB of RAM Is there any suggestion about the mergeFactor,maxMergeFactor value that I should use for my case? Thanks and Regards, Maureen Erick Erickson [EMAIL PROTECTED] wrote: Maureen: I lost the e-mail where you re-throw the exception. But you'd get a *lot* more information if you'd print the stacktrace via (catch Exception e) { e.printStackTrace(); throw e; } And that would allow the folks who understand Lucene to give you a LOT more help ... Best Erick On 1/27/07, Chris Hostetter wrote: did you try triggering a thread dump to see what it was doing at that point? depending on your merge factors and other IndexWriter settings it could just be doing a relaly big merge. : Date: Sat, 27 Jan 2007 09:40:47 -0800 (PST) : From: maureen tanuwidjaja : Reply-To: java-user@lucene.apache.org : To: java-user@lucene.apache.org : Subject: My program stops indexing after 1th documents is indexed : : Hi all, : : Is there any limitation of number of file that lucene can handle? : I indexed a total of 3 XML Documents,however it stops at 1th documents. : No warning,no error ,no exception as well. : : Indexing C:\sweetpea\wikipedia_xmlfiles\part-18\491876.xml : Indexing C:\sweetpea\wikipedia_xmlfiles\part-18\491886.xml : Indexing C:\sweetpea\wikipedia_xmlfiles\part-18\491887.xml : Indexing C:\sweetpea\wikipedia_xmlfiles\part-18\491891.xml : Indexing C:\sweetpea\wikipedia_xmlfiles\part-18\491893.xml : Indexing C:\sweetpea\wikipedia_xmlfiles\part-18\491896.xml --1th doc : --it idles here-- : : At first I thought that it was the size of 1th document is so big so that it took quite a long time to put into the index.Then i found out that the 1th document has the size of 6 KB only.Indexing process stops for about 1 hour,so that i decide to terminate the progress. : : Is there anything to do with smt like setCompoundFiles etc?cause I dont include any in my program... : : Any suggestion pls? : : THanks and Best Regards, : : Maureen : : : - : Any questions? Get answers on any topic at Yahoo! Answers. Try it now. -Hoss - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - No need to miss a message. Get email on-the-go with Yahoo! Mail for Mobile. Get started.
Re: search on colon : ending words
Yes, thank you. That would be a good solution. But we are using Lucene's Standard Analyzer. It seems to index words with colons : and other punctuation by default. Is there a simple way to have the Analyzer not to index colons specifically and punctuation in general? Erick Erickson [EMAIL PROTECTED] wrote: I've got to ask why you'd want to search on colons. Why not just index the words without colons and search without them too? Let's say you index the word work: Do you really want to have a search on work fail? By and large, you're better off indexing and searching without punctuation Best Erick On 1/28/07, Felix Litman wrote: Is there a simple way to turn off field-search syntax in the Lucene parser, and have Lucene recognize words ending in a colon : as search terms instead? Such words are very common occurrences for our documents (or any plain text), but Lucene does not seem to find them. :-( Thank you, Felix
Re: IndexWriter.docCount
28 jan 2007 kl. 05.54 skrev Doron Cohen: karl wettin [EMAIL PROTECTED] wrote on 27/01/2007 13:49:24: In essence, should I return index.getDocumentsByNumber().size() - index.getDeletedDocuments().size() + unflushedDocuments.size(); or index.getDocumentsByNumber().size() + unflushedDocuments.size(); ? I guess it is the 2nd one - without subtracting the number of deleted docs. That is enough for me to settle. Thanks again. (I linked to this thread from a comment) (but I don't know what is getDocumentsByNumber() - nothing like this in the trunk, nor in current patch for 550.) If you still really want to find it, perhaps you were looking in the IndexWriter in the core rather than the InstantiatedIndexWriter of contrib/instantiated? -- karl - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: search on colon : ending words
StandardAnalyzer should not be indexing punctuation from my experience...instead something like old:fart would be indexed as old and fart. QueryParser will then generate a query of old within 1 of fart for the query old:fart. This is the case for all punctuation I have run into. Things like f.b.i are handled differently though. Its indexed as fbi...ie the dots are removed...thats part of the acronym handling. There are a couple other special handlers as well...but in general punctuation is ignored...except that QueryParser will look for the words broken by the punctuation next to each other. -Mark Felix Litman wrote: Yes, thank you. That would be a good solution. But we are using Lucene's Standard Analyzer. It seems to index words with colons : and other punctuation by default. Is there a simple way to have the Analyzer not to index colons specifically and punctuation in general? Erick Erickson [EMAIL PROTECTED] wrote: I've got to ask why you'd want to search on colons. Why not just index the words without colons and search without them too? Let's say you index the word work: Do you really want to have a search on work fail? By and large, you're better off indexing and searching without punctuation Best Erick On 1/28/07, Felix Litman wrote: Is there a simple way to turn off field-search syntax in the Lucene parser, and have Lucene recognize words ending in a colon : as search terms instead? Such words are very common occurrences for our documents (or any plain text), but Lucene does not seem to find them. :-( Thank you, Felix - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Multiword Highlighting
For what it's worth Mark (Miller), there *is* a need for just highlight the query terms without trying to get excerpts functionality - something a la Google cache (different colours...mmm, nice). FWIW, the existing highlighter doesn't *have* to fragment - just pass a NullFragmenter to the highlighter. Ideally we'd have one implementation that tackles phrase support and preserves (optional) support for selecting fragments. I can see that to achieve this the existing highlighter design would need to change. Currently the highlighter identifies fragments first (typically using an implementation which arbitrarily chops text after 'n' words) and then selects which of these fragments have the highest density of high-scoring query terms. This logic would need to change to : 1) Use QuerySpansExtractor to identify all the *spans* in the document 2) Use a sliding window to select fragments, taking care to select fragments that wholly contain spans, rather than selecting only part of a span. 3) Mark up the hits. Clearly, for people uninterested in selecting fragments, step 2 can be skipped. Cheers Mark ___ All new Yahoo! Mail The new Interface is stunning in its simplicity and ease of use. - PC Magazine http://uk.docs.yahoo.com/nowyoucan.html - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Announcement: Lucene powering Monster job search index (Beta)
Correction: We only do the euclidan computation during sorting. For filtering, a simple bounding box is computed to approximate the radius, and 2 range comparisons are made to exclude documents. Because these comparisons are done outside of Lucene as integer comparisons, it is pretty fast. With 13000 results, the seach time with distance sort is about 200 msec (compared to 30 ms for a simple non-radius, date-sorted keyword search). Peter On 1/27/07, no spam [EMAIL PROTECTED] wrote: Isn't this extremely ineffecient to do the euclidean distance twice? Perhaps not a huge deal if a small search result set. I at times have 13,000 results that match my search terms of an index with 1.2 million docs. Can't you do some simple radian math first to ensure it's way out of bounds, then do the euclidian distance for the subset within bounds? I'm currently only doing the distance calc once (post hit collector). I don't have any performance numbers with the double vs single distance calc. I'm still working out the sort by radius myself. Mark On 11/3/06, Peter Keegan [EMAIL PROTECTED] wrote: Daniel, Yes, this is correct if you happen to be doing a radius search and sorting by mileage. Peter
Re: search on colon : ending words
We want to be able to return a result regardless if users use a colon or not in the query. So 'work:' and 'work' query should still return same result. With the current parser if a user enters 'work:' with a : , Lucene does not return anything :-(. It seems to me the Lucene parser issue we are wondering if there is any simple way to make the Lucene parser ignore the : in the query? any thoughts? Erick Erickson [EMAIL PROTECTED] wrote: I've got to ask why you'd want to search on colons. Why not just index the words without colons and search without them too? Let's say you index the word work: Do you really want to have a search on work fail? By and large, you're better off indexing and searching without punctuation Best Erick On 1/28/07, Felix Litman wrote: Is there a simple way to turn off field-search syntax in the Lucene parser, and have Lucene recognize words ending in a colon : as search terms instead? Such words are very common occurrences for our documents (or any plain text), but Lucene does not seem to find them. :-( Thank you, Felix
Re: search on colon : ending words
On Jan 28, 2007, at 3:47 PM, Felix Litman wrote: We want to be able to return a result regardless if users use a colon or not in the query. So 'work:' and 'work' query should still return same result. With the current parser if a user enters 'work:' with a : , Lucene does not return anything :-(. It seems to me the Lucene parser issue we are wondering if there is any simple way to make the Lucene parser ignore the : in the query? any thoughts? What about preprocessing the query string and replace colons with a space? Or perhaps escape colons with a backslash (I believe that works, but haven't confirmed it lately). Would users ever need to use fielded selectors? Or QueryParser syntax in general? If not, then bypass QueryParser altogether and analyze the string yourself and build up a query clauses into a BooleanQuery. Erik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: search on colon : ending words
Felix Litman wrote: We want to be able to return a result regardless if users use a colon or not in the query. So 'work:' and 'work' query should still return same result. With the current parser if a user enters 'work:' with a : , Lucene does not return anything :-(. It seems to me the Lucene parser issue we are wondering if there is any simple way to make the Lucene parser ignore the : in the query? The StandardAnalyzer already strips out the colons from the indexed text, so all you need to do is get rid of them in the query. Would String newquery = query.replace(query, :, ); work? It uses a space as the new text so that two query words that happened to be separated by the colon would still be separate words ... --MDC - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: search on colon : ending words
great suggestion and Eric's also earlier. Thank you. Felix Michael D. Curtin [EMAIL PROTECTED] wrote: Felix Litman wrote: We want to be able to return a result regardless if users use a colon or not in the query. So 'work:' and 'work' query should still return same result. With the current parser if a user enters 'work:' with a : , Lucene does not return anything :-(. It seems to me the Lucene parser issue we are wondering if there is any simple way to make the Lucene parser ignore the : in the query? The StandardAnalyzer already strips out the colons from the indexed text, so all you need to do is get rid of them in the query. Would String newquery = query.replace(query, :, ); work? It uses a space as the new text so that two query words that happened to be separated by the colon would still be separate words ... --MDC - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Multiword Highlighting
I do use the NullFragmenter now. I have no interest in the fragments at the moment, just in showing hits on the source document. It would be great if I could just show the real hits though. The span approach seems to work fine for me. I have even tested the highlighting using my sentence and paragraph proximity search queries from my query parser. These use a modified NotSpan (I call it WithinSpan) within an unbound NearSpan. I did a few queries that combine that structure with wildcard and boolean queries...everything appeared to work grand -- I got all the correct highlights. I just have to combine the highlights (spans) and refine my code (and that color comment Otis made is something I am interested in well -- it would be great to have the words found in a single spanquery be the same color, or a similar shade). - Mark markharw00d wrote: For what it's worth Mark (Miller), there *is* a need for just highlight the query terms without trying to get excerpts functionality - something a la Google cache (different colours...mmm, nice). FWIW, the existing highlighter doesn't *have* to fragment - just pass a NullFragmenter to the highlighter. Ideally we'd have one implementation that tackles phrase support and preserves (optional) support for selecting fragments. I can see that to achieve this the existing highlighter design would need to change. Currently the highlighter identifies fragments first (typically using an implementation which arbitrarily chops text after 'n' words) and then selects which of these fragments have the highest density of high-scoring query terms. This logic would need to change to : 1) Use QuerySpansExtractor to identify all the *spans* in the document 2) Use a sliding window to select fragments, taking care to select fragments that wholly contain spans, rather than selecting only part of a span. 3) Mark up the hits. Clearly, for people uninterested in selecting fragments, step 2 can be skipped. Cheers Mark ___ All new Yahoo! Mail The new Interface is stunning in its simplicity and ease of use. - PC Magazine http://uk.docs.yahoo.com/nowyoucan.html - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
printout of the stack trace while failing to indexing the 190,000th ocument
OK,This is the printout of the stack trace while failing to indexing the 190,000th ocument Indexing C:\sweetpea\wikipedia_xmlfiles\part-18\491886.xml Indexing C:\sweetpea\wikipedia_xmlfiles\part-18\491887.xml Indexing C:\sweetpea\wikipedia_xmlfiles\part-18\491891.xml Indexing C:\sweetpea\wikipedia_xmlfiles\part-18\491893.xml Indexing C:\sweetpea\wikipedia_xmlfiles\part-18\491896.xml java.io.IOException: There is not enough space on the disk at java.io.RandomAccessFile.writeBytes(Native Method) at java.io.RandomAccessFile.write(Unknown Source) at org.apache.lucene.store.FSIndexOutput.flushBuffer(FSDirectory.java:583) at org.apache.lucene.store.BufferedIndexOutput.flush(BufferedIndexOutput.java:85) at org.apache.lucene.store.BufferedIndexOutput.writeBytes(BufferedIndexOutput.java:75) at org.apache.lucene.index.CompoundFileWriter.copyFile(CompoundFileWriter.java:212) at org.apache.lucene.index.CompoundFileWriter.close(CompoundFileWriter.java:169) at org.apache.lucene.index.SegmentMerger.createCompoundFile(SegmentMerger.java:153) at org.apache.lucene.index.IndexWriter.mergeSegments(IndexWriter.java:1447) at org.apache.lucene.index.IndexWriter.maybeMergeSegments(IndexWriter.java:1286) at org.apache.lucene.index.IndexWriter.flushRamSegments(IndexWriter.java:1232) at org.apache.lucene.index.IndexWriter.maybeFlushRamSegments(IndexWriter.java:1224) at org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:652) at org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:631) at edu.ntu.ce.maureen.index.DocumentIndexer.addDocToIndex(DocumentIndexer.java:39) at edu.ntu.ce.maureen.index.DOMTraversal.fileTraverse(DOMTraversal.java:123) at edu.ntu.ce.maureen.index.DOMTraversal.fileTraverse(DOMTraversal.java:106) at edu.ntu.ce.maureen.index.DOMTraversal.main(DOMTraversal.java:133) java.io.IOException: There is not enough space on the disk Can anyone help? Thanks and Regards, Maureen - Never miss an email again! Yahoo! Toolbar alerts you the instant new Mail arrives. Check it out.
Re: How many documents in the biggest Lucene index to date?
On Jan 26, 2007, at 2:30 PM, Otis Gospodnetic wrote: It really all dependsright Erik? Ha! Looks like I've earned a tag line around here, eh?! :) On the hardware you are using, complexity of queries, query concurrency, query latency you are willing to live with, the size of the index, etc. A few million sounds small even for average/ cheap hw. I have several multi-million document indices that are constantly hammered over on Simpy.com and we use Lucene at Technorati to index the blogosphere, so you can imagine those numbers. To handle that much data things needs to be heavily distributed, of course. Admittedly I've not run indexes anywhere close to the numbers folks have already mentioned on this thread already. I'm about to build my largest index to date, at ~3.7M documents. Erik Otis - Original Message From: Bill Taylor [EMAIL PROTECTED] To: java-user@lucene.apache.org Sent: Friday, January 26, 2007 12:45:43 AM Subject: How many documents in the biggest Lucene index to date? I have used Lucene to index a small collection - only a few hundred documents. I have a potential client who wants to index a collection which will start at about a million documents and could easily grow to two million. Has anyone used Lucene with an index that large? Thank you very much. Bill Taylor - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: printout of the stack trace while failing to indexing the 190,000th ocument
On Jan 28, 2007, at 9:15 PM, maureen tanuwidjaja wrote: OK,This is the printout of the stack trace while failing to indexing the 190,000th ocument java.io.IOException: There is not enough space on the disk Can anyone help? Ummm get more disk space?! Erik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: printout of the stack trace while failing to indexing the 190,000th ocument
I think so ...btw may I ask the opinion, will it be useful to optimize let say every 50,000-60,000 documents? I have total of 660,000 docs... Erik Hatcher [EMAIL PROTECTED] wrote: On Jan 28, 2007, at 9:15 PM, maureen tanuwidjaja wrote: OK,This is the printout of the stack trace while failing to indexing the 190,000th ocument java.io.IOException: There is not enough space on the disk Can anyone help? Ummm get more disk space?! Erik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - Everyone is raving about the all-new Yahoo! Mail beta.
Re: Is the new version of the Lucene book available in any form?
On Jan 26, 2007, at 1:56 PM, Bill Taylor wrote: I notice that the Lucene book offered by Amazon was published in 2004. I saw some mail on the subject of a new edition. Is the new edition available in any form? I promise to buy the new edition as soon as it comes out even if I get some of the material early. I wrote a book which was published by the MIT Press; I know how long it takes to get a book out. This is a thread more suited to the Manning forum for LIA: http:// www.manning-sandbox.com/thread.jspa?forumID=152threadID=17520 In short, LIA2 will live, that much is for sure. Failing that, how should I learn more about the internals of Lucene? Ask here. Delve into the source code. Study the unit tests. My client has a large code base in C++. The system has its own index which is not all that fast. One way to improve performance would be to convert to the C version of Lucene. Is HTTP communication viable for your situation? If so, give Solr a shot. C - HTTP - Solr - Lucene and back won't be not all that fast. In fact, it'll be very fast. Erik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Is the new version of the Lucene book available in any form?
On Jan 26, 2007, at 5:28 PM, Chris Hostetter wrote: : LIA2 will happen, but Lucene is undergoing a lot of changes, so Erik and : I are going to wait a little more for development to calm down : (utopia?). you're waiting for Lucene development to calm down? ... that could be a long wait. We're not exactly waiting. I'm working night and day on Solr + Ruby (solrb and Flare) for various projects. A book project, especially a 2nd edition, is an incredible undertaking and commitment. It is an undertaking Otis and I plan on carving out time for in the near future, but predicting exactly when that will be is not worth speculating. Rest assured that this list will be kept well informed of LIA2's progress. java-user is the audience to which we most cater. Erik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: printout of the stack trace while failing to indexing the 190,000th ocument
On Jan 28, 2007, at 11:23 PM, maureen tanuwidjaja wrote: I think so ...btw may I ask the opinion, will it be useful to optimize let say every 50,000-60,000 documents? I have total of 660,000 docs... Lucene automatically merges segments periodically during large indexing runs. Look at the parameters available on IndexWriter, and research the best practices mentioned about those settings in this forum archives, the Lucene wiki, and other resources (such as articles and Lucene in Action). With sufficient disk space you'll be able to tune those settings to keep the index as unsegmented as you like as you index, and then optimize after the batch is completed again for good measure. Erik Erik Hatcher [EMAIL PROTECTED] wrote: On Jan 28, 2007, at 9:15 PM, maureen tanuwidjaja wrote: OK,This is the printout of the stack trace while failing to indexing the 190,000th ocument java.io.IOException: There is not enough space on the disk Can anyone help? Ummm get more disk space?! Erik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - Everyone is raving about the all-new Yahoo! Mail beta. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]