RE: Setting the COMMIT lock timeout.
Thanks for your prompt response! You ask about the use case. We have a series of similar intranet sites, each represented by a separate tomcat application instance using the same code base but with different start-up parameters. The intranets all provide a common search function based on the same underlying index. Admittedly we could have developed a single central search component, but given the way the code has evolved our current approach is simplest for us. With separate application instances sharing access to the same index we are getting occasional COMMIT lock time outs even while using singleton IndexSearchers in each application. -Original Message- From: Daniel Naber [mailto:[EMAIL PROTECTED] Sent: 13 March 2006 23:23 To: java-user@lucene.apache.org Subject: Re: Setting the COMMIT lock timeout. On Montag 13 März 2006 22:24, Bill Janssen wrote: The default value isn't magic. The appropriate value is context-specific. I've got some people using Lucene on machines with slow disks, and we need to be able to increase the WRITE_LOCK_TIMEOUT to prevent entirely random lossage. Here's a patch (I hope it gets through). Let me know if it's okay, I will commit it then. Regards Daniel -- http://www.danielnaber.de - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
who can tell me how lucene search in the index files
I see there are seven different files with extentions .fnm .tis and etc. I just can't make sure how it looks up in the .tis file. Does lucene use Binary-Search to locate the term?
Write.lock error with spellchecker
I am trying to use the spellchecker plugin with Lucene 1.2. I get the following exception when my SpellIndexer class tries to create the spell index. The new directory is being created with all the correct permissions. There is no write.lock file being created. Has someone run into similar issue? Does this have to do with lucene1.2? Exception in thread main java.io.IOException: couldn't delete write.lock at org.apache.lucene.store.FSDirectory.deleteFile(Unknown Source) at org.apache.lucene.index.IndexReader.unlock(Unknown Source) at org.apache.lucene.search.spell.SpellChecker.indexDictionnary(Unknown Source) at com.unisource.ecom.search.lucene.SpellIndexer.createSpellIndex(SpellInde xer.java:35) at com.unisource.ecom.search.lucene.SpellIndexer.main(SpellIndexer.java:56) Thanks, Veda
IndexFiles.java
Hiya I am a beginner of Lucene. I try to use IndexFiles.java to index my text file directories, but it does not work. It always give me this error message even when I comment it out: Usage: java org.apache.lucene.demo.IndexFiles root_directory What does if (args.length == 0) mean? Thanks Miki - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Searching in paths
Hello, I have problem with indexing / quering paths eg I put /home/users/apache/txt/qqq__docu.txt in field called path, I wanted to submit query to find all documents which are provided by my user apache, so i tried to query Lucene as AND path:/home/users/* but not results were find by such query if I asked any other field without / the results are provided eg AND title natio*. Where am I doing mistake? What I can do to ask for paths (and all what is below of them)? Best Regards, Adr
Re: IndexFiles.java
It looks like you are not specifying the directory you want to index. Otis - Original Message From: Miki Sun [EMAIL PROTECTED] To: java-user@lucene.apache.org Sent: Tuesday, March 14, 2006 11:27:04 AM Subject: IndexFiles.java Hiya I am a beginner of Lucene. I try to use IndexFiles.java to index my text file directories, but it does not work. It always give me this error message even when I comment it out: Usage: java org.apache.lucene.demo.IndexFiles root_directory What does if (args.length == 0) mean? Thanks Miki - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: IndexFiles.java
I think I did. I modified these code: //creat a directory to write the indices to static final File INDEX_DIR = new File(File.separator + Bible_index); //specify the directory to be indexed final File docDir = new File(File.separator + Bible/1/); Whereever else should I change? Thanks a lot! On 14/03/06, Otis Gospodnetic [EMAIL PROTECTED] wrote: It looks like you are not specifying the directory you want to index. Otis - Original Message From: Miki Sun [EMAIL PROTECTED] To: java-user@lucene.apache.org Sent: Tuesday, March 14, 2006 11:27:04 AM Subject: IndexFiles.java Hiya I am a beginner of Lucene. I try to use IndexFiles.java to index my text file directories, but it does not work. It always give me this error message even when I comment it out: Usage: java org.apache.lucene.demo.IndexFiles root_directory What does if (args.length == 0) mean? Thanks Miki - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] -- * * * * * * * * * * * * * * * * * * * * * * * * * * * * * Smartweb Technologies Centre School of Computing St Andrew Street Aberdeen AB25 1HG Tel: +44 (0)1224 - 262479 Web: http://athena.comp.rgu.ac.uk/staff/ms/ * * * * * * * * * * * * * * * * * * * * * * * * * * * * * - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: IndexFiles.java
you need to specify it from the command line ie, java org.apache.lucene.demo.IndexFile 'type in your starting directory here' On 3/14/06, Miki Sun [EMAIL PROTECTED] wrote: I think I did. I modified these code: //creat a directory to write the indices to static final File INDEX_DIR = new File(File.separator + Bible_index); //specify the directory to be indexed final File docDir = new File(File.separator + Bible/1/); Whereever else should I change? Thanks a lot! On 14/03/06, Otis Gospodnetic [EMAIL PROTECTED] wrote: It looks like you are not specifying the directory you want to index. Otis - Original Message From: Miki Sun [EMAIL PROTECTED] To: java-user@lucene.apache.org Sent: Tuesday, March 14, 2006 11:27:04 AM Subject: IndexFiles.java Hiya I am a beginner of Lucene. I try to use IndexFiles.java to index my text file directories, but it does not work. It always give me this error message even when I comment it out: Usage: java org.apache.lucene.demo.IndexFiles root_directory What does if (args.length == 0) mean? Thanks Miki - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] -- * * * * * * * * * * * * * * * * * * * * * * * * * * * * * Smartweb Technologies Centre School of Computing St Andrew Street Aberdeen AB25 1HG Tel: +44 (0)1224 - 262479 Web: http://athena.comp.rgu.ac.uk/staff/ms/ * * * * * * * * * * * * * * * * * * * * * * * * * * * * * - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: IndexFiles.java
How do you do it using Kawa? I am not familar with command line operations. Thanks On 14/03/06, Joe Scanlon [EMAIL PROTECTED] wrote: you need to specify it from the command line ie, java org.apache.lucene.demo.IndexFile 'type in your starting directory here' On 3/14/06, Miki Sun [EMAIL PROTECTED] wrote: I think I did. I modified these code: //creat a directory to write the indices to static final File INDEX_DIR = new File(File.separator + Bible_index); //specify the directory to be indexed final File docDir = new File(File.separator + Bible/1/); Whereever else should I change? Thanks a lot! On 14/03/06, Otis Gospodnetic [EMAIL PROTECTED] wrote: It looks like you are not specifying the directory you want to index. Otis - Original Message From: Miki Sun [EMAIL PROTECTED] To: java-user@lucene.apache.org Sent: Tuesday, March 14, 2006 11:27:04 AM Subject: IndexFiles.java Hiya I am a beginner of Lucene. I try to use IndexFiles.java to index my text file directories, but it does not work. It always give me this error message even when I comment it out: Usage: java org.apache.lucene.demo.IndexFiles root_directory What does if (args.length == 0) mean? Thanks Miki - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] -- * * * * * * * * * * * * * * * * * * * * * * * * * * * * * Smartweb Technologies Centre School of Computing St Andrew Street Aberdeen AB25 1HG Tel: +44 (0)1224 - 262479 Web: http://athena.comp.rgu.ac.uk/staff/ms/ * * * * * * * * * * * * * * * * * * * * * * * * * * * * * - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] -- * * * * * * * * * * * * * * * * * * * * * * * * * * * * * Smartweb Technologies Centre School of Computing St Andrew Street Aberdeen AB25 1HG Tel: +44 (0)1224 - 262479 Web: http://athena.comp.rgu.ac.uk/staff/ms/ * * * * * * * * * * * * * * * * * * * * * * * * * * * * * - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Searching in paths
You need to index the field as a keyword, or use an analyzer that will not strip the / from the string Aviran http://www.aviransplace.com -Original Message- From: Java Programmer [mailto:[EMAIL PROTECTED] Sent: Tuesday, March 14, 2006 11:28 AM To: java-user@lucene.apache.org Subject: Searching in paths Hello, I have problem with indexing / quering paths eg I put /home/users/apache/txt/qqq__docu.txt in field called path, I wanted to submit query to find all documents which are provided by my user apache, so i tried to query Lucene as AND path:/home/users/* but not results were find by such query if I asked any other field without / the results are provided eg AND title natio*. Where am I doing mistake? What I can do to ask for paths (and all what is below of them)? Best Regards, Adr - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Good MMapDirectory performance
- I read from Peter Keegan's recent postings: - The Lucene server is using MMapDirectory. I'm running - the jvm with -Xmx16000M. Peak memory usage of the jvm - on Linux is about 6GB and 7.8GB on windows. - We don't have nearly as much memory as Peter but I - wonder whether he is gaining anything with such - a large heap. My application gets better throughput with more VM, but that is probably due to heavy use of ByteBuffers in the application, not VM for Lucene. Peter On 3/12/06, kent.fitch [EMAIL PROTECTED] wrote: I thought I'd post some good news about MMapDirectory as the comments in the release notes are quite downbeat about its performance. In some environments MMapDirectory provides a big improvement. Our test application is an index of 11.4 million documents which are derived from MARC (bibliographic) catalogue records. Our aim is to build a system to demonstrate relevance ranking and result clustering for library union catalogue searching (a union catalogue accumulates/merges records from multiple ibraries). Our main index component sizes: fdt 17GB fdx 91MB tis 82MB frq 45MB prx 11MB tii 1.2 MB We have a separate Lucence index (not discussed further) which stores the MARC records. Each document has many fields. We'll probably reduce the number after we decide on the best search strategies, but lots of fields gives us lots of flexability whilst testing search and ranking strategies. Stored and unindexed fields, used for summary results: display title display author display publication details holdingsCount (number of libraries holding) Tokenized indices: title author subject genre keyword (all text) Keyword (untokenized) indices: title author subject genre audience Dewey/LC classification language isbn/issn publication date (date range code) unique bibliographic id Wildcard Tokenized indices created by a custom stub analyzer which reduces a term to its first few characters: title author subject keyword Field boosts are set for some fields. For example, title sub title, series title, component title are all stored as title but with different field boosts (as a match on normal title is deemed more relevant than a match on series title). The document boost is set to the sqrt of the holdingsCount (favouring popular resources). The user interface supports searching and refining searches on specific fields but the most common search is created from a single google style search box. Here's a typical query generated from a 2 word search: +(titleWords:franz kafka^4.0 authorWords:franz kafka^3.0 subjectWords:franz kafka^3.0 keywords:franz kafka^1.4 title:franz kafka^4.0 (+titleWords:franz +titleWords:kafka^3.0) author:franz kafka^3.0 +authorWords:franz +authorWords:kafka^2.0) subject:franz kafka^3.0 (+subjectWords:franz +subjectWords:kafka^1.5) (+genreWords:franz +genreWords:kafka^2.0) (+keywords:franz +keywords:kafka) (+titleWildcard:fra +titleWildcard:kaf^0.7) (+authorWildcard:fra +authorWildcard:kaf^0.7) (+subjectWildcard:fra +subjectWildcard:kaf^0.7) (+keywordWildcard:fra +keywordWildcard:kaf^0.2) ) It generated 1635 hits. We then read the first 700 documents in the hit list and extract the date, subject, author, genre, Dewey/LC classification and audience fields for each, accumulating the popularity of each. Using this data, for each of the subject, author, genre, Dewey/LC and audience categories, we find the 30 most popular field values and for each of these we query the index to find their frequency in the entire index. We then render the first 100 document results (title, author, publication details, holdings) and the top 30 for each of subject, author, genre, Dewey/KC and audience, ordering each list by the popularity of the term in the hit results (sample of the first 700) and rendering the size of the text based on the frequency of the term in the entire database (a bit like the Flickr tag popularity lists). We also render a graph of hit results by date range. The initial search is very quick - typically a small number of tens of millsecs. The clustering takes much longer - reading up to 700 records, extracting all those fields, sorting to get the top 30 of each field category, looking up the frequency of each term in the database. The test machine was a SunFire440 with 2 x 1.593GHz UltraSPARC-IIIi processors and 8GB of memory running Solaris 9, Java 1.5 in 64 bit mode, Jetty. The Lucene data directory is stored on a local 10K SCSI disk. The benchmark consisted of running 13,142 representative and unique search phrases collected from another system. The search phrases are unsorted. The client (testing) system is run on another unloaded computer and was configured to run a varying number of threads representing different loads. The results discussed here were produced with 3
Re: who can tell me how lucene search in the index files
hu andy wrote: I see there are seven different files with extentions .fnm .tis and etc. I just can't make sure how it looks up in the .tis file. Does lucene use Binary-Search to locate the term? See TermInfosReader. It loads the .tii file into memory, which contains one in every N entries of the .tis file and points into the real locations in the .tis file. When Lucene looks for a term, it does a binary search through this reduced index to find which segment of the .tis file the term is in, and then scans through the .tis file linearly until it finds the term. Daniel -- Daniel Noll Nuix Pty Ltd Suite 79, 89 Jones St, Ultimo NSW 2007, AustraliaPh: +61 2 9280 0699 Web: http://www.nuix.com.au/Fax: +61 2 9212 6902 This message is intended only for the named recipient. If you are not the intended recipient you are notified that disclosing, copying, distributing or taking any action in reliance on the contents of this message or attachment is strictly prohibited. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Add a module to the lucene
Hi, Can we add a module to lucene so that we are able to use our own similarity measure to calculate the similarity between documents and queries? As lucene has defined its own measure, we can do few with it. Considering the documents and queries represented as the vectors, we only need one class to read the vectors and use our own defined measure to calculate their similarity. How do you think of it? regards jason
Add a module to the lucene!!!
Hi, Can we add a module to lucene so that we are able to use our own similarity measure to calculate the similarity between documents and queries? As lucene has defined its own measure, we can do few with it. Considering the documents and queries represented as the vectors, we only need one class to read the vectors and use our own defined measure to calculate their similarity. How do you think of it? regards jason
Add more module to the lucene
Hi, Can we add more module to the lucene so that we can easily use our own measures to calculate similarity between documents and queries? I have read some codes of the original lucene, i dont think it is easy to change the similarity measure used. But i think we can build a module which can read the vectors of documents from the index structure. Then, we can use our own similarity measures. FYI. Regards jason.
lucene query analysis
Hi The problem which i am facing is that the query is Case Sensitive If i type in BIG letters i am not able to see answers and if i type in small letters i am able to see results Is there anything by which i can do a case conversion Now i am using a WhiteSpaceAnalyser . What Analyser should change it to ? Rgds Prabhu