RE: Setting the COMMIT lock timeout.

2006-03-14 Thread Jim Bedford-roberts
Thanks for your prompt response! You ask about the use case. We have a series 
of similar intranet sites, each represented by a separate tomcat application 
instance using the same code base but with different start-up parameters. The 
intranets all provide a common search function based on the same underlying 
index.

Admittedly we could have developed a single central search component, but given 
the way the code has evolved our current approach is simplest for us. With 
separate application instances sharing access to the same index we are getting 
occasional COMMIT lock time outs even while using singleton IndexSearchers in 
each application. 

-Original Message-
From: Daniel Naber [mailto:[EMAIL PROTECTED] 
Sent: 13 March 2006 23:23
To: java-user@lucene.apache.org
Subject: Re: Setting the COMMIT lock timeout.

On Montag 13 März 2006 22:24, Bill Janssen wrote:

 The default value isn't magic.  The appropriate value is
 context-specific.  I've got some people using Lucene on machines with
 slow disks, and we need to be able to increase the WRITE_LOCK_TIMEOUT
 to prevent entirely random lossage.

Here's a patch (I hope it gets through). Let me know if it's okay, I will 
commit it then.

Regards
 Daniel

-- 
http://www.danielnaber.de

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



who can tell me how lucene search in the index files

2006-03-14 Thread hu andy
I see there are seven different files with extentions .fnm .tis and etc. I
just can't make sure how it looks up in the .tis file. Does lucene use
Binary-Search to locate the term?


Write.lock error with spellchecker

2006-03-14 Thread Madhusudan, Veda \(Norcross, DAV\)
I am trying to use the spellchecker plugin with Lucene 1.2. I get the
following exception when my SpellIndexer class tries to create the spell
index. The new directory is being created with all the correct
permissions. There is no write.lock file being created. Has someone run
into similar issue? Does this have to do with lucene1.2?

 

Exception in thread main java.io.IOException: couldn't delete
write.lock

at org.apache.lucene.store.FSDirectory.deleteFile(Unknown
Source)

at org.apache.lucene.index.IndexReader.unlock(Unknown Source)

at
org.apache.lucene.search.spell.SpellChecker.indexDictionnary(Unknown
Source)

at
com.unisource.ecom.search.lucene.SpellIndexer.createSpellIndex(SpellInde
xer.java:35)

at
com.unisource.ecom.search.lucene.SpellIndexer.main(SpellIndexer.java:56)

 

Thanks,

Veda



IndexFiles.java

2006-03-14 Thread Miki Sun
Hiya

I am a beginner of Lucene. I try to use IndexFiles.java to index my
text file directories, but it does not work. It always give me this
error message even when I comment it out:

Usage: java org.apache.lucene.demo.IndexFiles root_directory

What does if (args.length == 0)  mean?

Thanks

Miki

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Searching in paths

2006-03-14 Thread Java Programmer
Hello,
I have problem with indexing / quering paths eg I put
/home/users/apache/txt/qqq__docu.txt in field called path, I wanted to
submit query to find all documents which are provided by my user apache, so
i tried to query Lucene as AND path:/home/users/* but not results were find
by such query if I asked any other field without / the results are provided
eg AND title natio*.
Where am I doing mistake? What I can do to ask for paths (and all what is
below of them)?

Best Regards,
Adr


Re: IndexFiles.java

2006-03-14 Thread Otis Gospodnetic
It looks like you are not specifying the directory you want to index.

Otis

- Original Message 
From: Miki Sun [EMAIL PROTECTED]
To: java-user@lucene.apache.org
Sent: Tuesday, March 14, 2006 11:27:04 AM
Subject: IndexFiles.java

Hiya

I am a beginner of Lucene. I try to use IndexFiles.java to index my
text file directories, but it does not work. It always give me this
error message even when I comment it out:

Usage: java org.apache.lucene.demo.IndexFiles root_directory

What does if (args.length == 0)  mean?

Thanks

Miki

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]





-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: IndexFiles.java

2006-03-14 Thread Miki Sun
I think I did. I modified these code:

//creat a directory to write the indices to
static final File INDEX_DIR = new File(File.separator + Bible_index);

//specify the directory to be indexed
final File docDir = new File(File.separator + Bible/1/);

Whereever else should I change?

Thanks a lot!

On 14/03/06, Otis Gospodnetic [EMAIL PROTECTED] wrote:
 It looks like you are not specifying the directory you want to index.

 Otis

 - Original Message 
 From: Miki Sun [EMAIL PROTECTED]
 To: java-user@lucene.apache.org
 Sent: Tuesday, March 14, 2006 11:27:04 AM
 Subject: IndexFiles.java

 Hiya

 I am a beginner of Lucene. I try to use IndexFiles.java to index my
 text file directories, but it does not work. It always give me this
 error message even when I comment it out:

 Usage: java org.apache.lucene.demo.IndexFiles root_directory

 What does if (args.length == 0)  mean?

 Thanks

 Miki

 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]





 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]




--
* * * * * * * * * * * * * * * * * * * * * * * * * * * * *
Smartweb Technologies Centre
School of Computing
St Andrew Street
Aberdeen AB25 1HG
Tel: +44 (0)1224 - 262479
Web: http://athena.comp.rgu.ac.uk/staff/ms/
* * * * * * * * * * * * * * * * * * * * * * * * * * * * *

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: IndexFiles.java

2006-03-14 Thread Joe Scanlon
you need to specify it from the command line

ie, java org.apache.lucene.demo.IndexFile 'type in your starting directory
here'


On 3/14/06, Miki Sun [EMAIL PROTECTED] wrote:

 I think I did. I modified these code:

 //creat a directory to write the indices to
 static final File INDEX_DIR = new File(File.separator + Bible_index);

 //specify the directory to be indexed
 final File docDir = new File(File.separator + Bible/1/);

 Whereever else should I change?

 Thanks a lot!

 On 14/03/06, Otis Gospodnetic [EMAIL PROTECTED] wrote:
  It looks like you are not specifying the directory you want to index.
 
  Otis
 
  - Original Message 
  From: Miki Sun [EMAIL PROTECTED]
  To: java-user@lucene.apache.org
  Sent: Tuesday, March 14, 2006 11:27:04 AM
  Subject: IndexFiles.java
 
  Hiya
 
  I am a beginner of Lucene. I try to use IndexFiles.java to index my
  text file directories, but it does not work. It always give me this
  error message even when I comment it out:
 
  Usage: java org.apache.lucene.demo.IndexFiles root_directory
 
  What does if (args.length == 0)  mean?
 
  Thanks
 
  Miki
 
  -
  To unsubscribe, e-mail: [EMAIL PROTECTED]
  For additional commands, e-mail: [EMAIL PROTECTED]
 
 
 
 
 
  -
  To unsubscribe, e-mail: [EMAIL PROTECTED]
  For additional commands, e-mail: [EMAIL PROTECTED]
 
 


 --
 * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
 Smartweb Technologies Centre
 School of Computing
 St Andrew Street
 Aberdeen AB25 1HG
 Tel: +44 (0)1224 - 262479
 Web: http://athena.comp.rgu.ac.uk/staff/ms/
 * * * * * * * * * * * * * * * * * * * * * * * * * * * * *

 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]




Re: IndexFiles.java

2006-03-14 Thread Miki Sun
How do you do it using Kawa? I am not familar with command line operations.

Thanks

On 14/03/06, Joe Scanlon [EMAIL PROTECTED] wrote:
 you need to specify it from the command line

 ie, java org.apache.lucene.demo.IndexFile 'type in your starting directory
 here'


 On 3/14/06, Miki Sun [EMAIL PROTECTED] wrote:
 
  I think I did. I modified these code:
 
  //creat a directory to write the indices to
  static final File INDEX_DIR = new File(File.separator + Bible_index);
 
  //specify the directory to be indexed
  final File docDir = new File(File.separator + Bible/1/);
 
  Whereever else should I change?
 
  Thanks a lot!
 
  On 14/03/06, Otis Gospodnetic [EMAIL PROTECTED] wrote:
   It looks like you are not specifying the directory you want to index.
  
   Otis
  
   - Original Message 
   From: Miki Sun [EMAIL PROTECTED]
   To: java-user@lucene.apache.org
   Sent: Tuesday, March 14, 2006 11:27:04 AM
   Subject: IndexFiles.java
  
   Hiya
  
   I am a beginner of Lucene. I try to use IndexFiles.java to index my
   text file directories, but it does not work. It always give me this
   error message even when I comment it out:
  
   Usage: java org.apache.lucene.demo.IndexFiles root_directory
  
   What does if (args.length == 0)  mean?
  
   Thanks
  
   Miki
  
   -
   To unsubscribe, e-mail: [EMAIL PROTECTED]
   For additional commands, e-mail: [EMAIL PROTECTED]
  
  
  
  
  
   -
   To unsubscribe, e-mail: [EMAIL PROTECTED]
   For additional commands, e-mail: [EMAIL PROTECTED]
  
  
 
 
  --
  * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
  Smartweb Technologies Centre
  School of Computing
  St Andrew Street
  Aberdeen AB25 1HG
  Tel: +44 (0)1224 - 262479
  Web: http://athena.comp.rgu.ac.uk/staff/ms/
  * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
 
  -
  To unsubscribe, e-mail: [EMAIL PROTECTED]
  For additional commands, e-mail: [EMAIL PROTECTED]
 
 




--
* * * * * * * * * * * * * * * * * * * * * * * * * * * * *
Smartweb Technologies Centre
School of Computing
St Andrew Street
Aberdeen AB25 1HG
Tel: +44 (0)1224 - 262479
Web: http://athena.comp.rgu.ac.uk/staff/ms/
* * * * * * * * * * * * * * * * * * * * * * * * * * * * *

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Searching in paths

2006-03-14 Thread Mordo, Aviran (EXP N-NANNATEK)
You need to index the field as a keyword, or use an analyzer that will
not strip the / from the string

Aviran
http://www.aviransplace.com 

-Original Message-
From: Java Programmer [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, March 14, 2006 11:28 AM
To: java-user@lucene.apache.org
Subject: Searching in paths

Hello,
I have problem with indexing / quering paths eg I put
/home/users/apache/txt/qqq__docu.txt in field called path, I wanted
to submit query to find all documents which are provided by my user
apache, so i tried to query Lucene as AND path:/home/users/* but not
results were find by such query if I asked any other field without / the
results are provided eg AND title natio*.
Where am I doing mistake? What I can do to ask for paths (and all what
is below of them)?

Best Regards,
Adr



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Good MMapDirectory performance

2006-03-14 Thread Peter Keegan
- I read from Peter Keegan's recent postings:
- The Lucene server is using MMapDirectory. I'm running
-  the jvm with -Xmx16000M. Peak memory usage of the jvm
-  on Linux is about 6GB and 7.8GB on windows.
- We don't have nearly as much memory as Peter but I
- wonder whether he is gaining anything with such
- a large heap.

My application gets better throughput with more VM, but that is probably due
to heavy use of ByteBuffers in the application, not VM for Lucene.

Peter



On 3/12/06, kent.fitch [EMAIL PROTECTED] wrote:

 I thought I'd post some good news about MMapDirectory as
 the comments in the release notes are quite downbeat about
 its performance.  In some environments MMapDirectory
 provides a big improvement.

 Our test application is an index of 11.4 million
 documents which are derived from MARC (bibliographic)
 catalogue records.  Our aim is to build a system
 to demonstrate relevance ranking and result clustering
 for library union catalogue searching (a union
 catalogue accumulates/merges records from multiple
 ibraries).

 Our main index component sizes:
 fdt 17GB
 fdx 91MB
 tis 82MB
 frq 45MB
 prx 11MB
 tii 1.2 MB

 We have a separate Lucence index (not discussed further)
 which stores the MARC records.

 Each document has many fields.   We'll probably reduce the
 number after we decide on the best search strategies, but
 lots of fields gives us lots of flexability whilst testing
 search and ranking strategies.

 Stored and unindexed fields, used for summary results:
   display title
   display author
   display publication details
   holdingsCount (number of libraries holding)

 Tokenized indices:
   title
   author
   subject
   genre
   keyword (all text)

 Keyword (untokenized) indices:
   title
   author
   subject
   genre
   audience
   Dewey/LC classification
   language
   isbn/issn
   publication date (date range code)
   unique bibliographic id

 Wildcard Tokenized indices created by a custom stub
 analyzer which reduces a term to its first few characters:
   title
   author
   subject
   keyword

 Field boosts are set for some fields.  For example, title
 sub title, series title, component title are all
 stored as title but with different field boosts (as a
 match on normal title is deemed more relevant than a match
 on series title).

 The document boost is set to the sqrt of the holdingsCount
 (favouring popular resources).

 The user interface supports searching and refining searches
 on specific fields but the most common search is created
 from a single google style search box.  Here's a typical
 query generated from a 2 word search:

 +(titleWords:franz kafka^4.0
   authorWords:franz kafka^3.0
   subjectWords:franz kafka^3.0
   keywords:franz kafka^1.4
   title:franz kafka^4.0
   (+titleWords:franz +titleWords:kafka^3.0)
   author:franz kafka^3.0
   +authorWords:franz +authorWords:kafka^2.0)
   subject:franz kafka^3.0
   (+subjectWords:franz +subjectWords:kafka^1.5)
   (+genreWords:franz +genreWords:kafka^2.0)
   (+keywords:franz +keywords:kafka)
   (+titleWildcard:fra +titleWildcard:kaf^0.7)
   (+authorWildcard:fra +authorWildcard:kaf^0.7)
   (+subjectWildcard:fra +subjectWildcard:kaf^0.7)
   (+keywordWildcard:fra +keywordWildcard:kaf^0.2)
 )

 It generated 1635 hits.  We then read the first 700
 documents in the hit list and extract the date, subject,
 author, genre, Dewey/LC classification and audience
 fields for each, accumulating the popularity of each.

 Using this data, for each of the subject, author, genre,
 Dewey/LC and audience categories, we find the 30 most
 popular field values and for each of these we query the
 index to find their frequency in the entire index.

 We then render the first 100 document results (title,
 author, publication details, holdings) and the top 30
 for each of subject, author, genre, Dewey/KC and audience,
 ordering each list by the popularity of the term in the
 hit results (sample of the first 700) and rendering the
 size of the text based on the frequency of the term in
 the entire database (a bit like the Flickr tag popularity
 lists).  We also render a graph of hit results by date
 range.

 The initial search is very quick - typically a small
 number of tens of millsecs.  The clustering takes
 much longer - reading up to 700 records, extracting
 all those fields, sorting to get the top 30 of each
 field category, looking up the frequency of each term
 in the database.

 The test machine was a SunFire440 with 2 x 1.593GHz
 UltraSPARC-IIIi processors and 8GB of memory running
 Solaris 9, Java 1.5 in 64 bit mode, Jetty. The Lucene data
 directory is stored on a local 10K SCSI disk.

 The benchmark consisted of running 13,142 representative
 and unique search phrases collected from another system.
 The search phrases are unsorted.  The client (testing)
 system is run on another unloaded computer and was
 configured to run a varying number of threads representing
 different loads.  The results discussed here were
 produced with 3 

Re: who can tell me how lucene search in the index files

2006-03-14 Thread Daniel Noll

hu andy wrote:

I see there are seven different files with extentions .fnm .tis and etc. I
just can't make sure how it looks up in the .tis file. Does lucene use
Binary-Search to locate the term?


See TermInfosReader.

It loads the .tii file into memory, which contains one in every N 
entries of the .tis file and points into the real locations in the .tis 
file.


When Lucene looks for a term, it does a binary search through this 
reduced index to find which segment of the .tis file the term is in, and 
then scans through the .tis file linearly until it finds the term.


Daniel

--
Daniel Noll

Nuix Pty Ltd
Suite 79, 89 Jones St, Ultimo NSW 2007, AustraliaPh: +61 2 9280 0699
Web: http://www.nuix.com.au/Fax: +61 2 9212 6902

This message is intended only for the named recipient. If you are not
the intended recipient you are notified that disclosing, copying,
distributing or taking any action in reliance on the contents of this
message or attachment is strictly prohibited.

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Add a module to the lucene

2006-03-14 Thread jason
Hi,

Can we add a module to lucene so that we are able to use our own similarity
measure to calculate the similarity between documents and queries? As lucene
has defined its own measure, we can do few with it.

Considering the documents and queries represented as the vectors, we only
need one class to read the vectors and use our own defined measure to
calculate their similarity.

How do you think of it?

regards
jason


Add a module to the lucene!!!

2006-03-14 Thread jason
 Hi,

Can we add a module to lucene so that we are able to use our own similarity
measure to calculate the similarity between documents and queries? As lucene
has defined its own measure, we can do few with it.

Considering the documents and queries represented as the vectors, we only
need one class to read the vectors and use our own defined measure to
calculate their similarity.

How do you think of it?

regards
jason


Add more module to the lucene

2006-03-14 Thread jason
Hi,

Can we add more module to the lucene so that we can easily use our own
measures to calculate similarity between documents and queries? I have read
some codes of the original lucene, i dont think it is easy to change the
similarity measure used. But i think we can build a module which can read
the vectors of documents from the index structure. Then, we can use our own
similarity measures.


FYI.

Regards

jason.


lucene query analysis

2006-03-14 Thread Raghavendra Prabhu
Hi

The problem which i am facing is that the query is Case Sensitive

If i type in BIG letters i am not able to see answers and if  i type in
small letters i am able to see results

Is there anything by which i can do a case conversion

Now i am using a WhiteSpaceAnalyser . What Analyser should change it to ?


Rgds
Prabhu