Re: too many files open error
Thanks for the information. I downloaded 1.3-rc2 and put a IndexReader.close() at the end of the search routine. This seems to have cleared up the problems. Also, demo source code for results.jsp to return a pointer to IndexReader so that it could be closed at end of search. Ie. searcher = new IndexSearcher(ir = IndexReader.open(indexName)); //create an indexSearcher for our page ... ir.close(); erik 3/27/2004 4:44:28 AM On Mar 27, 2004, at 1:28 AM, Charlie Smith wrote: What would be the URL for the JUnit stuff? Look in the src/test directory of where you checked out Lucene. All JUnit tests live there and below. BTW: I was able to build a new Index.class file, with the additional line iw.setUserCompoundFile(true) after extracting the lucene-1.4-rc1-dev.jar. Then reindexed. Guess what - no worky. :( Maybe you'd care to share some *technical* details to elaborate on no worky?! Still get the too many files open error on invoking a modified results.jsp. (The one that comes with Lucene.) The index is created with a call to the IndexWriter.class file. The Index.class file calls IndexWriter, and I modified to have the setUseCompoundFile(true). Added lines 350 and 442 as suggested. What Index.class are you talking about? The demo application? Can I get 1.3-RC2? Could someone point me to the URL for this download please ;) Use CVS :) I noticed following entry in mail archives: http://www.mail-archive.com/[EMAIL PROTECTED]/ msg06118.html along with 139 others that dealt with the too many files open problem. Looks like this is a high priority problem that might justify a new release in and of itself? People have been using Lucene for years and managing the file handle issue by setting ulimit and other tricks like optimizing to reduce the number of segments. So it is not as much a problem as it is a known issue that can be managed. My ulimit is set to unlimited. From what I can tell, it is a stress test issue that seems to work under 1.3-rc2. Would anyone understand the differences to know if it will work as well under next stable release of Lucene? I'm not up to speed on what the issues with 1.3 final are - I've just started hearing about it. Is there a reproducible example that demonstrates a problem? Erik John Brown has made his source available. Go to Google and search for docSearcher. He seems quite willing to help where needed. Use the reults.jsp routine that comes with Lucene to test, with following changes: snip Analyzer analyzer = new StopAnalyzer(); //construct our usual analyzer --- Analyzer analyzer = new StandardAnalyzer(); //construct our usual analyzer 68,69c54,56 query = QueryParser.parse(queryString, contents, analyzer); //parse the } catch (ParseException e) { //query and construct the Query --- query = QueryParser.parse(queryString, body, analyzer); //parse the //query = query.rewrite(reader); } catch (ParseException e) { //query and construct the Query 87a75 trtdfont size=5Search results for /fontfont size=5 color=blue%=queryString%/td/tr trtr 108a96,97 // cws: 2/25/04 added this to get format href link. RE r = new RE(/path/to/site/root/); 111d99 tr 114,122c102,131 String doctitle = doc.get(title);//get its title String url = doc.get(url); //get its url field if ((doctitle == null) || doctitle.equals()) //use the url if it has no title doctitle = url; //then output! % tda href=%=url%%=doctitle%/a/td td%=doc.get(summary)%/td /tr --- String path = doc.get(path); String type = doc.get(type); String title = doc.get(title); // cws: 2/25/04 added this to get format href link. String path_part = r.subst(path, /); String summary = doc.get(summary); String size = doc.get(size); String date = doc.get(mod_date); // date formating java.util.Date bd=DateField.stringToDate(date); Calendar nowD=Calendar.getInstance(); nowD.setTime(bd); int mon=nowD.get(nowD.MONTH)+1; int year=nowD.get(nowD.YEAR); int day=nowD.get(nowD.DAY_OF_MONTH); date = mon+/+day
Re: too many files open error
On Mar 26, 2004, at 10:35 PM, Charlie Smith wrote: When I built lucene with ant, it put down a jar file called ./lucene-1.3-final/build/lucene-1.4-rc1-dev.jar Odd name for a stable release jar file. What's in a name?! :) This is standard operating procedure. That -dev simply means you're building it locally and all bets are off as to what you built if you modified your local codebase. We remove the -dev when building releases, and adjust the rc1 to final or rc2 as appropriate. Erik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: too many files open error
If you are using Lucene 1.3, try using the index in compound format. You will have to rebuild (or convert) your index to this format. The handy utility Luke will convert an index easily. Erik On Mar 25, 2004, at 9:34 PM, Charlie Smith wrote: I need to get solution to following error ASAP. Please help me with this. I'm getting following error returned from call to snip try { searcher = new IndexSearcher( IndexReader.open(indexName) //create an indexSearcher for our page ); } catch (Exception e) { //any error that happens is probably due //to a permission problem or non-existant //or otherwise corrupt index % pERROR opening the Index - contact sysadmin!/p pWhile parsing query: %=e.getMessage()%/p %error = true; //don't do anything up to the footer } Output: ERROR opening the Index - contact sysadmin! While parsing query: /opt/famhistdev/fhstage/jbin/.docSearcher/indexes/fhstage_update/ _3ff.f6 (Too many open files) /snip Charlie 3/25/04 - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: too many files open error
I'm using lucene-1.2.jar as part of the build for this docSearcher application. Would these recommendations work for this or should I upgrade to lucene 1.3. In doing so, I'm not sure if a rewrite of the docSearcher will be necessary or not. Daniel Naber wrote on 3/26/04: Try IndexWriter.setUseCompoundFile(true) to limit the number of files. Erik Hatcher 3/26/2004 2:32:16 AM If you are using Lucene 1.3, try using the index in compound format. You will have to rebuild (or convert) your index to this format. The handy utility Luke will convert an index easily. Erik On Mar 25, 2004, at 9:34 PM, Charlie Smith wrote: I need to get solution to following error ASAP. Please help me with this. I'm getting following error returned from call to snip try { searcher = new IndexSearcher( IndexReader.open(indexName) //create an indexSearcher for our page ); } catch (Exception e) { //any error that happens is probably due //to a permission problem or non-existant //or otherwise corrupt index % pERROR opening the Index - contact sysadmin!/p pWhile parsing query: %=e.getMessage()%/p %error = true; //don't do anything up to the footer } Output: ERROR opening the Index - contact sysadmin! While parsing query: /opt/famhistdev/fhstage/jbin/.docSearcher/indexes/fhstage_update/ _3ff.f6 (Too many open files) /snip Charlie 3/25/04 - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: too many files open error
The compound format was added to Lucene 1.3 and was not part of 1.2. I'd definitely recommend upgrading. Heck, Lucene 1.4 could be released any day now :) Erik On Mar 26, 2004, at 12:25 PM, Charlie Smith wrote: I'm using lucene-1.2.jar as part of the build for this docSearcher application. Would these recommendations work for this or should I upgrade to lucene 1.3. In doing so, I'm not sure if a rewrite of the docSearcher will be necessary or not. Daniel Naber wrote on 3/26/04: Try IndexWriter.setUseCompoundFile(true) to limit the number of files. Erik Hatcher 3/26/2004 2:32:16 AM If you are using Lucene 1.3, try using the index in compound format. You will have to rebuild (or convert) your index to this format. The handy utility Luke will convert an index easily. Erik On Mar 25, 2004, at 9:34 PM, Charlie Smith wrote: I need to get solution to following error ASAP. Please help me with this. I'm getting following error returned from call to snip try { searcher = new IndexSearcher( IndexReader.open(indexName) //create an indexSearcher for our page ); } catch (Exception e) { //any error that happens is probably due //to a permission problem or non-existant //or otherwise corrupt index % pERROR opening the Index - contact sysadmin!/p pWhile parsing query: %=e.getMessage()%/p %error = true; //don't do anything up to the footer } Output: ERROR opening the Index - contact sysadmin! While parsing query: /opt/famhistdev/fhstage/jbin/.docSearcher/indexes/fhstage_update/ _3ff.f6 (Too many open files) /snip Charlie 3/25/04 - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: too many files open error
Is this :) serious? Because we have a need/interest in the new field sorting capabilities and QueryParser keyword handling of dashes (-) that would be in 1.4, I believe. It's so much easier to explain that we'll use a final release of Lucene instead of a dev build Lucene. If so, what would an expected release date be? thanks, chad. -Original Message- From: Erik Hatcher [mailto:[EMAIL PROTECTED] Sent: Fri 3/26/2004 12:23 PM To: Lucene Users List Cc: Subject: Re: too many files open error The compound format was added to Lucene 1.3 and was not part of 1.2. I'd definitely recommend upgrading. Heck, Lucene 1.4 could be released any day now :) Erik On Mar 26, 2004, at 12:25 PM, Charlie Smith wrote: I'm using lucene-1.2.jar as part of the build for this docSearcher application. Would these recommendations work for this or should I upgrade to lucene 1.3. In doing so, I'm not sure if a rewrite of the docSearcher will be necessary or not. Daniel Naber wrote on 3/26/04: Try IndexWriter.setUseCompoundFile(true) to limit the number of files. Erik Hatcher 3/26/2004 2:32:16 AM If you are using Lucene 1.3, try using the index in compound format. You will have to rebuild (or convert) your index to this format. The handy utility Luke will convert an index easily. Erik On Mar 25, 2004, at 9:34 PM, Charlie Smith wrote: I need to get solution to following error ASAP. Please help me with this. I'm getting following error returned from call to snip try { searcher = new IndexSearcher( IndexReader.open(indexName) //create an indexSearcher for our page ); } catch (Exception e) { //any error that happens is probably due //to a permission problem or non-existant //or otherwise corrupt index % pERROR opening the Index - contact sysadmin!/p pWhile parsing query: %=e.getMessage()%/p %error = true; //don't do anything up to the footer } Output: ERROR opening the Index - contact sysadmin! While parsing query: /opt/famhistdev/fhstage/jbin/.docSearcher/indexes/fhstage_update/ _3ff.f6 (Too many open files) /snip Charlie 3/25/04 - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: too many files open error
Is there another source for the pdfbox than the sourceforge link from pdfbox.org? I'd like to get the linux/unix version, and wonder if the source there is ok to use? Couldn't this be made available to jakarta, or maybe it has? Otis wrote on 3/24/04 Subject:Re: analyzer for word perfect? I just finished writing a chapter for Lucene in Action that deals with that. PDF: pdfbox.org MS Word/Excel: jakarta.apache.org/poi WP: http://www.google.com/search?q=java+word+perfect+parser Note that what you need are parsers. The term Analyzer has a special meaning in Lucene realm. Otis --- Charlie Smith wrote: Is there an analyzer for WordPerfect files? I have a need to be able to index WP files as well as MS files, pdfs, etc. -- Otis wrote on 3/24/04 Subject:Re: analyzer for word perfect? I just finished writing a chapter for Lucene in Action that deals with that. PDF: pdfbox.org MS Word/Excel: jakarta.apache.org/poi WP: http://www.google.com/search?q=java+word+perfect+parser Note that what you need are parsers. The term Analyzer has a special meaning in Lucene realm. Otis --- Charlie Smith wrote: Is there an analyzer for WordPerfect files? I have a need to be able to index WP files as well as MS files, pdfs, etc. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: too many files open error
As PDFBox is an all Java solution there is no specific linux/unix version. The source that is available with the downloaded package should suit your needs. What does the sourceforge site not provide for you? Ben On Fri, 26 Mar 2004, Charlie Smith wrote: Is there another source for the pdfbox than the sourceforge link from pdfbox.org? I'd like to get the linux/unix version, and wonder if the source there is ok to use? Couldn't this be made available to jakarta, or maybe it has? Otis wrote on 3/24/04 Subject:Re: analyzer for word perfect? I just finished writing a chapter for Lucene in Action that deals with that. PDF: pdfbox.org MS Word/Excel: jakarta.apache.org/poi WP: http://www.google.com/search?q=java+word+perfect+parser Note that what you need are parsers. The term Analyzer has a special meaning in Lucene realm. Otis --- Charlie Smith wrote: Is there an analyzer for WordPerfect files? I have a need to be able to index WP files as well as MS files, pdfs, etc. -- Otis wrote on 3/24/04 Subject:Re: analyzer for word perfect? I just finished writing a chapter for Lucene in Action that deals with that. PDF: pdfbox.org MS Word/Excel: jakarta.apache.org/poi WP: http://www.google.com/search?q=java+word+perfect+parser Note that what you need are parsers. The term Analyzer has a special meaning in Lucene realm. Otis --- Charlie Smith wrote: Is there an analyzer for WordPerfect files? I have a need to be able to index WP files as well as MS files, pdfs, etc. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: too many files open error
On Mar 26, 2004, at 1:33 PM, Chad Small wrote: Is this :) serious? This is open-source. I'm only as serious as it would take for someone to push it through. I don't know what the timeline is, although lots of new features are available. Because we have a need/interest in the new field sorting capabilities and QueryParser keyword handling of dashes (-) that would be in 1.4, I believe. It's so much easier to explain that we'll use a final release of Lucene instead of a dev build Lucene. Why explain it?! Just show great results and let that be the explanation :) If so, what would an expected release date be? *shrug* - feel free to lobby for it. I don't know what else is planned before a release. Erik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: too many files open error
I don't want to get into a debate that involves slamming source forge. They provide a great service. However to answer your question: 1. Reliable downloads. They don't always seem to complete. 2. An orgainized tree stucture into available apps 3. A nicer presentation. GUI look and feel is very gooy. 4. Sloppy forums. Answers to questions take a long time or never. Look and feel is ugly. I guess lack of forum organization. Notification of answered questions is not there. Or doesn't appear to be. On the other hand, it's much better than nothing. Back to the problem of (too many files open): I'm really having difficulty recompiling all the java routines in the program that I have. Being new to Java, it's probably just my inexperience, but I hope to be able to get this done soon. Though any help along these lines would be greatly appreciated. Specifically, inside the Index.java file is a call to: iw = new IndexWriter(di.indexerPath, new StandardAnalyzer(), false); iw.setUseCompoundFilter(true); I added the 2nd line per recommendation earlier on this topic. In trying to recompile the Index.java file I get following errors: Index.java:125: cannot resolve symbol symbol : method didParse () location: class PdfToText else if (pp.didParse()) { ^ Index.java:352: cannot resolve symbol symbol : method setUseCompoundFilter (boolean) location: class org.apache.lucene.index.IndexWriter iw.setUseCompoundFilter(true); ^ Index.java:444: cannot resolve symbol symbol : method setUseCompoundFilter (boolean) location: class org.apache.lucene.index.IndexWriter iw.setUseCompoundFilter(true); ^ ./EmailThread.java:2: package javax.mail does not exist import javax.mail.*; ^ ./EmailThread.java:3: package javax.mail.internet does not exist import javax.mail.internet.*; ^ ./WordToText.java:5: package org.apache.poi.hdf.extractor does not exist import org.apache.poi.hdf.extractor.WordDocument; ... and it keeps going to about 58 errors. Hey it was peaking out over 100, so I'm making some improvement. Any help would be appreciated. [EMAIL PROTECTED] 3/26/2004 12:16:13 PM As PDFBox is an all Java solution there is no specific linux/unix version. The source that is available with the downloaded package should suit your needs. What does the sourceforge site not provide for you? Ben On Fri, 26 Mar 2004, Charlie Smith wrote: Is there another source for the pdfbox than the sourceforge link from pdfbox.org? I'd like to get the linux/unix version, and wonder if the source there is ok to use? Couldn't this be made available to jakarta, or maybe it has? Otis wrote on 3/24/04 Subject:Re: analyzer for word perfect? I just finished writing a chapter for Lucene in Action that deals with that. PDF: pdfbox.org MS Word/Excel: jakarta.apache.org/poi WP: http://www.google.com/search?q=java+word+perfect+parser Note that what you need are parsers. The term Analyzer has a special meaning in Lucene realm. Otis --- Charlie Smith wrote: Is there an analyzer for WordPerfect files? I have a need to be able to index WP files as well as MS files, pdfs, etc. -- Otis wrote on 3/24/04 Subject:Re: analyzer for word perfect? I just finished writing a chapter for Lucene in Action that deals with that. PDF: pdfbox.org MS Word/Excel: jakarta.apache.org/poi WP: http://www.google.com/search?q=java+word+perfect+parser Note that what you need are parsers. The term Analyzer has a special meaning in Lucene realm. Otis --- Charlie Smith wrote: Is there an analyzer for WordPerfect files? I have a need to be able to index WP files as well as MS files, pdfs, etc. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: too many files open error
Charlie Smith wrote: /opt/famhistdev/fhstage/jbin/.docSearcher/indexes/fhstage_update/_3ff.f6 (Too many open files) Just a suggestion... why not put a URL string in the Too many open files. Exception. Tons of people keep running into this problem and we keep wasting both our time annd their time. We could just link to the FAQ entry. Kevin -- Please reply using PGP. http://peerfear.org/pubkey.asc NewsMonster - http://www.newsmonster.org/ Kevin A. Burton, Location - San Francisco, CA, Cell - 415.595.9965 AIM/YIM - sfburtonator, Web - http://peerfear.org/ GPG fingerprint: 5FB2 F3E2 760E 70A8 6174 D393 E84D 8D04 99F1 4412 IRC - freenode.net #infoanarchy | #p2p-hackers | #newsmonster signature.asc Description: OpenPGP digital signature
Re: too many files open error
Chad Small wrote: Is this :) serious? Because we have a need/interest in the new field sorting capabilities URL to documentation for field sorting? and QueryParser keyword handling of dashes (-) that would be in 1.4, I believe. It's so much easier to explain that we'll use a final release of Lucene instead of a dev build Lucene. -- Please reply using PGP. http://peerfear.org/pubkey.asc NewsMonster - http://www.newsmonster.org/ Kevin A. Burton, Location - San Francisco, CA, Cell - 415.595.9965 AIM/YIM - sfburtonator, Web - http://peerfear.org/ GPG fingerprint: 5FB2 F3E2 760E 70A8 6174 D393 E84D 8D04 99F1 4412 IRC - freenode.net #infoanarchy | #p2p-hackers | #newsmonster signature.asc Description: OpenPGP digital signature
Re: too many files open error
On Mar 26, 2004, at 7:20 PM, Kevin A. Burton wrote: Chad Small wrote: Is this :) serious? Because we have a need/interest in the new field sorting capabilities URL to documentation for field sorting? Geez, you want documentation also? :) Try the JUnit test cases for starters. That is the definitive documentation at the moment. Erik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: too many files open error
When I built lucene with ant, it put down a jar file called ./lucene-1.3-final/build/lucene-1.4-rc1-dev.jar Odd name for a stable release jar file. [EMAIL PROTECTED] 03/26/04 06:06PM On Mar 26, 2004, at 7:20 PM, Kevin A. Burton wrote: Chad Small wrote: Is this :) serious? Because we have a need/interest in the new field sorting capabilities URL to documentation for field sorting? Geez, you want documentation also? :) Try the JUnit test cases for starters. That is the definitive documentation at the moment. Erik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: too many files open error
What would be the URL for the JUnit stuff? BTW: I was able to build a new Index.class file, with the additional line iw.setUserCompoundFile(true) after extracting the lucene-1.4-rc1-dev.jar. Then reindexed. Guess what - no worky. :( Help!!! Can I get 1.3-RC2? Could someone point me to the URL for this download please ;) I noticed following entry in mail archives: http://www.mail-archive.com/[EMAIL PROTECTED]/msg06118.html along with 139 others that dealt with the too many files open problem. Looks like this is a high priority problem that might justify a new release in and of itself? Charlie erik on 03/26/04 06:06PM On Mar 26, 2004, at 7:20 PM, Kevin A. Burton wrote: Chad Small wrote: Is this :) serious? Because we have a need/interest in the new field sorting capabilities URL to documentation for field sorting? Geez, you want documentation also? :) Try the JUnit test cases for starters. That is the definitive documentation at the moment. Erik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]