RE: Can i use lucene to search the internet.

2006-03-23 Thread Babu, KameshNarayana \(GE, Research, consultant\)
Title: Can i use lucene to search the internet.



Hi, 

Can we 
use NUTCH in windows OS

  -Original Message-From: gekkokid 
  [mailto:[EMAIL PROTECTED]Sent: Thursday, March 23, 2006 11:22 
  AMTo: java-user@lucene.apache.orgSubject: Re: Can i use 
  lucene to search the internet.
  Hi, are you asking does it have a 
  crawler? no it doesn't but nutch does http://lucene.apache.org/nutch/:)
  
  _gk
  
- Original Message - 
From: 
Babu, KameshNarayana (GE, Research, 
consultant) 
To: java-user@lucene.apache.org 

Sent: Thursday, March 23, 2006 5:44 
AM
Subject: Can i use lucene to search the 
internet.

hi all, Can i 
use lucene to search the internet. Are do we have nay open source 
applications. Thanks in advance 
GE Global Research Kamesh NarayanaBabu John F. Welch Technology CentreInformation Technology Management, Plot 122, Export Promotion 
Industrial Park,Phase II, 
Hoodi Village, Whitefield Road, Bangalore, Karnataka - 560066, 
INDIA.Phone: +91 (80) 2503 0457 | GE Dial comm.: 8 * 901 0359 | Mobile: +91 9986259850 | Email:- [EMAIL PROTECTED] 



RE: Can i use lucene to search the internet.

2006-03-23 Thread Babu, KameshNarayana \(GE, Research, consultant\)
Title: Can i use lucene to search the internet.



Hai 
All,
Can 
NUTCH be used in Windoes OS

  -Original Message-From: gekkokid 
  [mailto:[EMAIL PROTECTED]Sent: Thursday, March 23, 2006 11:22 
  AMTo: java-user@lucene.apache.orgSubject: Re: Can i use 
  lucene to search the internet.
  Hi, are you asking does it have a 
  crawler? no it doesn't but nutch does http://lucene.apache.org/nutch/:)
  
  _gk
  
- Original Message - 
From: 
Babu, KameshNarayana (GE, Research, 
consultant) 
To: java-user@lucene.apache.org 

Sent: Thursday, March 23, 2006 5:44 
AM
Subject: Can i use lucene to search the 
internet.

hi all, Can i 
use lucene to search the internet. Are do we have nay open source 
applications. Thanks in advance 
GE Global Research Kamesh NarayanaBabu John F. Welch Technology CentreInformation Technology Management, Plot 122, Export Promotion 
Industrial Park,Phase II, 
Hoodi Village, Whitefield Road, Bangalore, Karnataka - 560066, 
INDIA.Phone: +91 (80) 2503 0457 | GE Dial comm.: 8 * 901 0359 | Mobile: +91 9986259850 | Email:- [EMAIL PROTECTED] 



Re: Can i use lucene to search the internet.

2006-03-23 Thread Raghavendra Prabhu
Hi

It can be used if you run cygwin (the latest version)
Please have a look at nutch wiki

And you are mailing the wrong list


Rgds
Prabhu

On 3/23/06, Babu, KameshNarayana (GE, Research, consultant) 
[EMAIL PROTECTED] wrote:

  Hai All,
 Can NUTCH be used in Windoes OS

 -Original Message-
 *From:* gekkokid [mailto:[EMAIL PROTECTED]
 *Sent:* Thursday, March 23, 2006 11:22 AM
 *To:* java-user@lucene.apache.org
 *Subject:* Re: Can i use lucene to search the internet.

 Hi, are you asking does it have a crawler? no it doesn't but nutch does
 http://lucene.apache.org/nutch/ :)

 _gk

 - Original Message -
 *From:* Babu, KameshNarayana (GE, Research, consultant)[EMAIL PROTECTED]
 *To:* java-user@lucene.apache.org
 *Sent:* Thursday, March 23, 2006 5:44 AM
 *Subject:* Can i use lucene to search the internet.



 hi all,
 Can i use lucene to search the internet. Are do we have nay open source
 applications. Thanks in advance

 [image: ole0.bmp]* GE Global Research*
 *Kamesh NarayanaBabu*
 *John F. Welch Technology Centre
 Information Technology Management, Plot 122, Export Promotion Industrial
 Park,
 Phase II, Hoodi Village, Whitefield Road, Bangalore, Karnataka - 560066,
 INDIA.
 Phone: +91 (80) 2503 0457 | GE Dial comm.: 8 * 901 0359 | Mobile: +91
 9986259850 | Email:-  [EMAIL PROTECTED]




RE: Can i use lucene to search the internet.

2006-03-23 Thread Babu, KameshNarayana \(GE, Research, consultant\)
hi ,
thanks for the reply. Can i do without cygwin. Which list i should use for 
these queries. kindly help me.

-Original Message-
From: Raghavendra Prabhu [mailto:[EMAIL PROTECTED]
Sent: Thursday, March 23, 2006 3:48 PM
To: java-user@lucene.apache.org
Subject: Re: Can i use lucene to search the internet.


Hi

It can be used if you run cygwin (the latest version)
Please have a look at nutch wiki

And you are mailing the wrong list


Rgds
Prabhu

On 3/23/06, Babu, KameshNarayana (GE, Research, consultant) 
[EMAIL PROTECTED] wrote:

  Hai All,
 Can NUTCH be used in Windoes OS

 -Original Message-
 *From:* gekkokid [mailto:[EMAIL PROTECTED]
 *Sent:* Thursday, March 23, 2006 11:22 AM
 *To:* java-user@lucene.apache.org
 *Subject:* Re: Can i use lucene to search the internet.

 Hi, are you asking does it have a crawler? no it doesn't but nutch does
 http://lucene.apache.org/nutch/ :)

 _gk

 - Original Message -
 *From:* Babu, KameshNarayana (GE, Research, consultant)[EMAIL PROTECTED]
 *To:* java-user@lucene.apache.org
 *Sent:* Thursday, March 23, 2006 5:44 AM
 *Subject:* Can i use lucene to search the internet.



 hi all,
 Can i use lucene to search the internet. Are do we have nay open source
 applications. Thanks in advance

 [image: ole0.bmp]* GE Global Research*
 *Kamesh NarayanaBabu*
 *John F. Welch Technology Centre
 Information Technology Management, Plot 122, Export Promotion Industrial
 Park,
 Phase II, Hoodi Village, Whitefield Road, Bangalore, Karnataka - 560066,
 INDIA.
 Phone: +91 (80) 2503 0457 | GE Dial comm.: 8 * 901 0359 | Mobile: +91
 9986259850 | Email:-  [EMAIL PROTECTED]



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: FileNotFoundException: Corrupted Index? = Use jvm ShutdownHook

2006-03-23 Thread Olivier Jaquemet

Hi Otis,

Thanks for your reply.
I will also put the writer shutdown hook for this index, as you said.

I had already done that for other part of our code where we use other 
lucene index, but thought it would not be needed for this special index 
due to the fact that we rarely write on it. But this is a stupid thought 
as the jvm can also be shutdown during those rare case... and this 
corruption proves it..


I will watch if the problem still occurs and if it does not, I'll update 
the wiki FAQ with the following code (left here for search history 
purpose and for other users)

   // clean writer reader and searcher correctly
   Thread shutdown = new Thread() {
 public void run() {
   if (writer != null) {
 try { writer.close(); }
 catch (Exception ex){ /*empty*/ }
 writer = null;
   }
   if (reader != null) {
 try { reader.close(); }
 catch (IOException ex){ /*empty*/ }
 reader = null;
   }
   if (searcher != null) {
 try { searcher.close(); }
 catch (IOException ex){ /*empty*/ }
 searcher = null;
   }
 }
   };
   Runtime.getRuntime().addShutdownHook(shutdown);

Otis Gospodnetic wrote:

Hi Olivier,

You have shutdown hooks for read-only operations.  They won't corrupt your 
index.  I'd add shutdown hooks for IndexWriter.
If that fixes your problem, it would be great if you could add your shutdown 
hook code to the FAQ on the Wiki, or at least post it to java-user, so somebody 
else can put it there.

Otis

- Original Message 
From: Olivier Jaquemet [EMAIL PROTECTED]
To: Lucene Java User ML java-user@lucene.apache.org
Sent: Wednesday, March 22, 2006 10:08:28 AM
Subject: FileNotFoundException: Corrupted Index?

Hi all,

We are using the last version of lucene (1.9.1), and sometimes we end up 
with such error when opening one of the index our application uses:


java.io.FileNotFoundException: [...]/LuceneIndex/_ 46.fnm (No such file 
or directory)

   at java.io.RandomAccessFile.open(Native Method)
   at java.io.RandomAccessFile.init(RandomAccessFile.java:204)
   at 
org.apache.lucene.store.FSIndexInput$Descriptor.init(FSDirectory.java:425) 


   at org.apache.lucene.store.FSIndexInput.init(FSDirectory.java:434)
   at 
org.apache.lucene.store.FSDirectory.openInput(FSDirectory.java:324)

   at org.apache.lucene.index.FieldInfos.init(FieldInfos.java:56)
   at 
org.apache.lucene.index.SegmentReader.initialize(SegmentReader.java:144)

   at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:129)
   at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:110)
   at 
org.apache.lucene.index.IndexReader$1.doBody(IndexReader.java:154)

   at org.apache.lucene.store.Lock$With.run(Lock.java:109)
   at org.apache.lucene.index.IndexReader.open(IndexReader.java:143)
   at org.apache.lucene.index.IndexReader.open(IndexReader.java:138)

The only solution available in this case being to completely remove and 
recreate the index.

I have the corrupted index available for testing should you need it.

Apparently this corruption occurs if the JVM has crashed or was shutdown 
too violently (kill -9)
I was wondering how a corruption of a lucene index could occur and how 
to prevent it, fix it on reopening or in a last resort, detect it to be 
able to recreate the index.


Note that I already have that kind of hook in the code for shutdown:

// clean writer reader and searcher correctly
Thread shutdown = new Thread() {
  public void run() {
if (reader != null) {
  try { reader.close(); }
  catch (IOException ex){ /*empty*/ }
  reader = null;
}
if (searcher != null) {
  try { searcher.close(); }
  catch (IOException ex){ /*empty*/ }
  searcher = null;
}
  }
};
Runtime.getRuntime().addShutdownHook(shutdown);
   
Or, on opening, code such as:


  Directory indexDir = FSDirectory.getDirectory(luceneDir, 
!IndexReader.indexExists(luceneDir));
  IndexReader.unlock(indexDir); // unlock directory in case of 
unproper shutdown

  if (!IndexReader.indexExists(luceneDir)) {
writer = new IndexWriter(indexDir, analyzer, true);
writer.close();
  }

Any suggestion or remark?

Thanks!
  



--
Olivier Jaquemet [EMAIL PROTECTED]
Ingénieur RD Jalios S.A.
Tel: 01.39.23.92.83
http://www.jalios.com/
http://support.jalios.com/




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: java.lang.OutOfMemoryError in lucene

2006-03-23 Thread escobar5

But i have the IBM JDK 1.4.2, do you know if this version still have the
problem??
--
View this message in context: 
http://www.nabble.com/java.lang.OutOfMemoryError-in-lucene-t1324911.html#a3551247
Sent from the Lucene - Java Users forum at Nabble.com.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Speed up Indexing

2006-03-23 Thread Jeff Rodenburg
I run Lucene.Net as well, and your indexing performance is dependent on more
factors aside from whether you're using the Java or C# version.  As a basic
suggestion, learn what you can about minMergeDocs and mergeFactor as well as
the compound file format.  Try different combinations to understand what is
faster vs. slower.

As a strategy for your specific scenario, you might consider building
several indexes in parallel, then merging the indexes at the end.

Hope this helps.

-- j


On 3/22/06, hu andy [EMAIL PROTECTED] wrote:

 Hi,everyone. I have a large mount of xml files of size 1G. I use
 lucene(the
 dotNet edition) to index . There are 8 fields for a document, with 4
 keyword
 fields and 4 unstored fields. I have set the minMergeDocs to 1 and
 mergeFactor to 100. It took about 2.5 hours (main memeory 3G, CPU p4 ) .I
 also try in-memory indexing  which is also more than 2.5hours.  Due to the
 performance requirement , I need complete the indexing in one hour without
 the use of distributing or clustering system . Cant it be possible?  Is it
 faster to use java Lucene than dotNet one? Any advice will be appreciated.
 Thank you in advance.




Re: Multiple threads in Lucene

2006-03-23 Thread Nikhil Goel
Hi Otis,

Thanks for the reply but I have one question to ask here. You said big no no
for opening opening multiple IndexWriters. I want to clarify :-
1) Do you mean multiple IndexWriters at the same time? I am not doing this.
At a time there is only one Indexwriter opened.
or
1) Do you mean I cant open another IndexWriter again after closing the prior
one. In my writing thread, for every file I index, I open a new IndexWriter
and close it and as soon as I have second file available for indexing. I
open the IndexWriter again and close it and directory object is the same
across all the threads as well as while reopening IndexWriters.

If the latter is NO too, then how would a developer make sure that this
index is closed when the Program is killed. Suppose a program is killed in
between and Index is not closed, then next time when I run the program there
will be a write.lock in Index and it won't allow us to open another index.

Please let me know if I am wrong in what I said.

Thanks
-Nikhil


On 3/22/06, Otis Gospodnetic [EMAIL PROTECTED] wrote:

 Yes, 1 IndexWriter + multiple IndexSearchers definitely work together :)
 I can't tell what you're doing wrong with the threads... it looks like you
 might be opening multiple IndexWriters on the same index/directory (big no
 no).

 Otis

 - Original Message 
 From: Nikhil Goel [EMAIL PROTECTED]
 To: java-user@lucene.apache.org
 Sent: Wednesday, March 22, 2006 6:04:41 PM
 Subject: Multiple threads in Lucene

 Hi Lucene Developers,

 According to Lucene Documentation, IndexWriter can exist with multiple
 IndexSearcher and its thread safe. To verify that: I wrote a simple
 program
 to simulate that condition but unfortunately I get an exception. Please
 let
 me know if anyone has ever tested the Lucene claim that IndexWriter and
 IndexSearcher are thread safe.


 I have a program in which I have 4 threads.
 1) One IndexWriter Thread
 2) 3 IndexSearcher Thread.

 Everytime when we need to index a file. We run the following code in
 IndexWriter Thread:-
 function IndexFile(Document doc)
 {
writer = new IndexWriter(directory, new StandardAnalyzer(), false);
writer.addDocument(doc);
writer.close();
 }

 Our IndexSearcherThread looks like this:-
 function IndexSearch(String termToBeSearched)
 {
  IndexSearchersearcher = new IndexSearcher(directory);
 //Note: This directory is the same reference as used to initiate
 IndexWriter  in Indexfile function. Hence this directory //reference is
 used
 across all the threads.

  Query query = QueryParser.parse(termToBeSearched, contents, new
 StandardAnalyzer());
  Hits hits = searcher.search(query);
 }

 If I execute these 4 threads above together, then whenever a search
 routine
 gets executed and IndexWriter is also in use, then I get an error at the
 following line:- writer.close();

 Stack Strace looks like this:-
 unable to close the writer stream
 java.io.IOException: read past EOF
 at org.apache.lucene.store.InputStream.refill(InputStream.java:192)
 at org.apache.lucene.store.InputStream.readByte(InputStream.java:81)
 at org.apache.lucene.store.InputStream.readBytes(InputStream.java:95)
 at org.apache.lucene.index.SegmentReader.norms(SegmentReader.java:375)
 at org.apache.lucene.index.SegmentReader.norms(SegmentReader.java:342)
 at org.apache.lucene.index.SegmentMerger.mergeNorms(SegmentMerger.java
 :306)
 at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:99)
 at org.apache.lucene.index.IndexWriter.mergeSegments(IndexWriter.java
 :430)
 at org.apache.lucene.index.IndexWriter.flushRamSegments(
 IndexWriter.java
 :383)
 at org.apache.lucene.index.IndexWriter.close(IndexWriter.java:193)


 Thanks in advance
 -Nikhil




 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]




Re: Multiple threads in Lucene

2006-03-23 Thread Olivier Jaquemet

Hi,

In order to prevent such problem, here is how you should open your index:
   Directory indexDir = FSDirectory.getDirectory(luceneDir, 
!IndexReader.indexExists(luceneDir));
   IndexReader.unlock(indexDir); // unlock directory in case of 
unproper shutdown

   if (!IndexReader.indexExists(luceneDir)) {
 writer = new IndexWriter(indexDir, analyzer, true);
 writer.close();
   }

And to prevent problems with writer/reader/searcher not being closed 
properly on exit, here is how you should make sure they are closed 
(although it is not guaranteed to be called at all by the jvm, it's 
better than nothing)

  // clean writer reader and searcher correctly
  Thread shutdown = new Thread() {
public void run() {
  if (writer != null) {
try { writer.close(); }
catch (Exception ex){ /*empty*/ }
writer = null;
  }
  if (reader != null) {
try { reader.close(); }
catch (IOException ex){ /*empty*/ }
reader = null;
  }
  if (searcher != null) {
try { searcher.close(); }
catch (IOException ex){ /*empty*/ }
searcher = null;
  }
}
  };
  Runtime.getRuntime().addShutdownHook(shutdown);

As another reminder if you start with lucene:
- Keep your reader/searcher open as long as possible until you write to 
the index. It increases performance. You can use a class like this one 
(taken from this ML):

 /**
  * For optimized used of the searcher, we keep it open as much as 
possible and
  * delay its close only when it is replaced by a new one when 
modifying index.

  */
 public class IndexSearcherWrapper extends IndexSearcher {
   private int referenceCount;
  
   public IndexSearcherWrapper(Directory dir) throws IOException {

 super(dir);
 this.referenceCount = 1;
   }
  
   public IndexSearcherWrapper getReference() {

 referenceCount++;
 return this;
   }
  
   public void close() throws IOException {

 referenceCount--;
 if (referenceCount = 0) {
   super.close();
 }
   }
 };

Use it like that:

   IndexSearcher localSearcher = searcher.getReference();
   Hits hits = localSearcher.search(query);
   [...]

And use a method such as this one every time you write to the index:
 /**
  * Renew internal reader and searcher, call this method after index 
change.

  */
 public void renewReaderAndSeacher() throws IOException {
   // Reader
   IndexReader oldReader = reader;
   reader = IndexReader.open(index);
   if (oldReader != null) {
 oldReader.close();
   }
   // Searcher
   IndexSearcherWrapper oldSearcher = searcher;
   searcher = new IndexSearcherWrapper(index);
   if (oldSearcher != null) {
 oldSearcher.close();
   }
 }

Hope it will help! :)


Nikhil Goel wrote:

Hi Otis,

Thanks for the reply but I have one question to ask here. You said big no no
for opening opening multiple IndexWriters. I want to clarify :-
1) Do you mean multiple IndexWriters at the same time? I am not doing this.
At a time there is only one Indexwriter opened.
or
1) Do you mean I cant open another IndexWriter again after closing the prior
one. In my writing thread, for every file I index, I open a new IndexWriter
and close it and as soon as I have second file available for indexing. I
open the IndexWriter again and close it and directory object is the same
across all the threads as well as while reopening IndexWriters.

If the latter is NO too, then how would a developer make sure that this
index is closed when the Program is killed. Suppose a program is killed in
between and Index is not closed, then next time when I run the program there
will be a write.lock in Index and it won't allow us to open another index.

Please let me know if I am wrong in what I said.

Thanks
-Nikhil


On 3/22/06, Otis Gospodnetic [EMAIL PROTECTED] wrote:
  

Yes, 1 IndexWriter + multiple IndexSearchers definitely work together :)
I can't tell what you're doing wrong with the threads... it looks like you
might be opening multiple IndexWriters on the same index/directory (big no
no).

Otis

- Original Message 
From: Nikhil Goel [EMAIL PROTECTED]
To: java-user@lucene.apache.org
Sent: Wednesday, March 22, 2006 6:04:41 PM
Subject: Multiple threads in Lucene

Hi Lucene Developers,

According to Lucene Documentation, IndexWriter can exist with multiple
IndexSearcher and its thread safe. To verify that: I wrote a simple
program
to simulate that condition but unfortunately I get an exception. Please
let
me know if anyone has ever tested the Lucene claim that IndexWriter and
IndexSearcher are thread safe.


I have a program in which I have 4 threads.
1) One IndexWriter Thread
2) 3 IndexSearcher Thread.

Everytime when we need to index a file. We run the following code in
IndexWriter Thread:-
function IndexFile(Document doc)
{
   writer = new IndexWriter(directory, new StandardAnalyzer(), false);
   writer.addDocument(doc);
   writer.close();
}

Our IndexSearcherThread looks like this:-
function IndexSearch(String 

Re: Changing ranking

2006-03-23 Thread Otis Gospodnetic
The place to start would be to look at the DefaultSimilarity, and the norms 
method there.  Perhaps you want to create your own Similarity implementation 
that returns either a constant 1 or something else that will favour longer 
text.  Somebody else with more experience in this area may have better or more 
precise suggestions.

Otis

- Original Message 
From: Leon Chaddock [EMAIL PROTECTED]
To: java-user@lucene.apache.org
Sent: Thursday, March 23, 2006 9:43:14 AM
Subject: Changing ranking

Hi,
At present lucene seems to rank very short documents over longer documents 
where the phrase occurs more regularily for instance which the search term 
cat

the cat went home

ranks higher than

the black cat when home past some other cats, on cat street

Is there anyway I can change luicene to rank longer documents with more 
phrase occurences higher

Many thanks

Leon 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]





-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Changing ranking

2006-03-23 Thread Marvin Humphrey


On Mar 23, 2006, at 11:22 AM, Otis Gospodnetic wrote:

The place to start would be to look at the DefaultSimilarity, and  
the norms method there.  Perhaps you want to create your own  
Similarity implementation that returns either a constant 1 or  
something else that will favour longer text.  Somebody else with  
more experience in this area may have better or more precise  
suggestions.


Here's an implementation of lengthNorm() that stops stops the  
weighting at 100 tokens.


  public float lengthNorm(String fieldName, int numTerms) {
numTerms = numTerms  100 ? 100 : numTerms;
return (float)(1.0 / Math.sqrt(numTerms));
  }

If you adopt it, you must boost short but important fields (e.g.  
title), or they won't contribute enough.


KinoSearch (my loose Perl/C port of Lucene) uses this algorithm, and  
it seems to work well.


To see an earlier discussion on this subject perform a web search for  
proposal defaultsimilarity lengthnorm.


Marvin Humphrey
Rectangular Research
http://www.rectangular.com/


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Joins between index and database

2006-03-23 Thread Tom Hill

Hi -

I have an application where I'm using Lucene to index the contents of 
a database. That's working fine.


But I have a problem where I'd like to retrieve a subset of the 
documents that match a search, based on a join table in the database.


How do people typically handle combining the results of a Lucene 
based search with the results of a database search?


Thanks,

Tom


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Joins between index and database

2006-03-23 Thread Paul Elschot
On Thursday 23 March 2006 20:51, Tom Hill wrote:
 Hi -
 
 I have an application where I'm using Lucene to index the contents of 
 a database. That's working fine.
 
 But I have a problem where I'd like to retrieve a subset of the 
 documents that match a search, based on a join table in the database.
 
 How do people typically handle combining the results of a Lucene 
 based search with the results of a database search?

One way is to get the values of some key field from the database,
create a Filter using terms created from these values, and use that
Filter in a search, or in a FilteredQuery.
See RangeFilter.bits() for some example code that creates a filter
from terms. Sorting the key values beforehand helps performance
for creating the filter. CachingWrapperFilter can also be handy.

In case you need a lot of filters for relatively few documents, have
a look here:
http://issues.apache.org/jira/browse/LUCENE-328

Regards,
Paul Elschot

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Multiple threads in Lucene

2006-03-23 Thread Doug Cutting

Olivier Jaquemet wrote:
   IndexReader.unlock(indexDir); // unlock directory in case of unproper 
shutdown


This should be used very carefully.  In particular, you should only call 
it when you are certain that no other applications are accessing the index.


Doug

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Can i use lucene to search the internet.

2006-03-23 Thread Bill Janssen
Let's stop this thread.

 Can i use lucene to search the internet.

No.

You may be able to use Lucene to *index* the internet, and then search
the resulting index.  Read the book Lucene in Action for a better idea
of what this would entail.

Bill

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Query question

2006-03-23 Thread Chris Hostetter
: Use Keyword (untokenized) field to index your paths.
: Consider using PerFieldAnalyzerWrapper to specify KeywordAnalyzer for your 
path field.
: Use the force, Luke - http://www.getopt.org/luke/ , to ensure your paths are 
indexed correctly.

you also don't wnat to use QueryParser.escape when you build the term
query explicitly -- that's only needed if you are passing the string to
QueryParser...

: Ex: Hits hits = multisearch.search(new TermQuery(new Term(key,
: QueryParser.escape(key;



-Hoss


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Changing ranking

2006-03-23 Thread Chris Hostetter

: Is there anyway I can change luicene to rank longer documents with more
: phrase occurences higher

if what you care about is only the number of occurences, and you don't
want the length to be a factor at all, then using Field.setOmitNorms(true)
on the Field for every document you add will not only accomplish this, but
will also save one byte per field per document in your index.

that can add up if you have a lot of fields whose length you don't care
about.


-Hoss


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Native code compilation

2006-03-23 Thread Seeta Somagani

Hi all,

Has anyone tried to compile their Lucene applications into native code?
Mine works fine in a VM but the call to search() on IndexSearcher is
crashing the application, after I compile it into native code. There is
apparently no problem in instantiating an IndexSearcher though. I tried
this on both Linux and Windows and am getting the same problem.
Thnx
Seeta


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



lucene NFS support

2006-03-23 Thread Dai, Chunhe
Hi,

 

Does anyone know whether Lucene plans to support NFS in later
release(2.0)? We are planning to integrate Lucene into our products and
cluster support is definitely needed. We want to check whether NFS
support is in the plan or not before implementing a new file locking
ourselves with it.

 

Thanks.

Chunhe



Re: Read past EOF error in Windows

2006-03-23 Thread Chris Cain

No that doesnt seem to be the problem.

Anyone have any other ideas?

On Tue, 21 Mar 2006 [EMAIL PROTECTED]

I had a problem in the past with security on the folder where your index 
is located...but your error does not seem to show that ... I would check 
anyway though...


-Original Message-
From: Chris Cain cbc20[at]hermes.cam.ac.uk
To: java-user[at]lucene.apache.org
Sent: Tue, 21 Mar 2006 15:33:26 + (GMT)
Subject: Read past EOF error in Windows


Hi all,

I wrote a lucene program which runs fine under Linux and Mac but fails on 
most Windows machines. (I have managed to get it to work on one version of 
XP however)


Specifically when i open or search the index i get the following error 
message.


Any help would be appreciated,
Cheers,
Chris

caught a class java.io.IOException
with message: read past EOF
java.io.IOException: read past EOF
at org.apache.lucene.store.FSIndexInput.readInternal(FSDirectory.java:451)
at 
org.apache.lucene.store.BufferedIndexInput.readBytes(BufferedIndexInput.java:45)
at 
org.apache.lucene.index.CompoundFileReader$CSIndexInput.readInternal(CompoundFileReader.java:219)
at 
org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:64)
at 
org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:33)

at org.apache.lucene.store.IndexInput.readInt(IndexInput.java:46)
at org.apache.lucene.index.SegmentTermEnum.init(SegmentTermEnum.java:47)
at org.apache.lucene.index.TermInfosReader.init(TermInfosReader.java:48)
at 
org.apache.lucene.index.SegmentReader.initialize(SegmentReader.java:147)

at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:129)
at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:115)
at org.apache.lucene.index.IndexReader$1.doBody(IndexReader.java:150)
at org.apache.lucene.store.Lock$With.run(Lock.java:109)
at org.apache.lucene.index.IndexReader.open(IndexReader.java:143)
at org.apache.lucene.index.IndexReader.open(IndexReader.java:127)
at org.apache.lucene.search.IndexSearcher.init(IndexSearcher.java:42)


-
To unsubscribe, e-mail: java-user-unsubscribe[at]lucene.apache.org
For additional commands, e-mail: java-user-help[at]lucene.apache.org


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Native code compilation

2006-03-23 Thread Otis Gospodnetic
Native code
There is a C++ port called CLucene, if that suits you more than coffee beans...

Otis

- Original Message 
From: Seeta Somagani [EMAIL PROTECTED]
To: java-user@lucene.apache.org
Sent: Thursday, March 23, 2006 4:47:33 PM
Subject: Native code compilation


Hi all,

Has anyone tried to compile their Lucene applications into native code?
Mine works fine in a VM but the call to search() on IndexSearcher is
crashing the application, after I compile it into native code. There is
apparently no problem in instantiating an IndexSearcher though. I tried
this on both Linux and Windows and am getting the same problem.
Thnx
Seeta


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]





-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: java.lang.OutOfMemoryError in lucene

2006-03-23 Thread Koji Sekiguchi
 But i have the IBM JDK 1.4.2, do you know if this version still have the
 problem??

I'm sorry I don't know that. But you can try it and if it solves the
problem,
you can add your experience to FAQ :)

Koji




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: lucene NFS support

2006-03-23 Thread Otis Gospodnetic
Hi Chunhe,

There are no NFS-specific plans.  Out of personal curiosity - why go for NFS 
and not NAS?

Otis

- Original Message 
From: Dai, Chunhe [EMAIL PROTECTED]
To: java-user@lucene.apache.org
Sent: Thursday, March 23, 2006 4:58:13 PM
Subject: lucene NFS support

Hi,

 

Does anyone know whether Lucene plans to support NFS in later
release(2.0)? We are planning to integrate Lucene into our products and
cluster support is definitely needed. We want to check whether NFS
support is in the plan or not before implementing a new file locking
ourselves with it.

 

Thanks.

Chunhe





-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Joins between index and database

2006-03-23 Thread markharw00d



See RangeFilter.bits() for some example code that creates a filter
from terms. 



Also see TermsFilter in the queries module in the contrib section.




___ 
To help you stay safe and secure online, we've developed the all new Yahoo! Security Centre. http://uk.security.yahoo.com


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: lucene NFS support

2006-03-23 Thread Dai, Chunhe
Thanks, Otis.

The reason is that some of our customers definitely use NFS and it is
hard to convince all of the hundreds of customers not to use NFS. So
naturally, the correct thing for us to do is to just support it since we
already have file locking mechanism that works on NFS. 

-Original Message-
From: Otis Gospodnetic [mailto:[EMAIL PROTECTED] 
Sent: Thursday, March 23, 2006 5:49 PM
To: java-user@lucene.apache.org
Subject: Re: lucene NFS support

Hi Chunhe,

There are no NFS-specific plans.  Out of personal curiosity - why go for
NFS and not NAS?

Otis

- Original Message 
From: Dai, Chunhe [EMAIL PROTECTED]
To: java-user@lucene.apache.org
Sent: Thursday, March 23, 2006 4:58:13 PM
Subject: lucene NFS support

Hi,

 

Does anyone know whether Lucene plans to support NFS in later
release(2.0)? We are planning to integrate Lucene into our products and
cluster support is definitely needed. We want to check whether NFS
support is in the plan or not before implementing a new file locking
ourselves with it.

 

Thanks.

Chunhe





-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Read past EOF error in Windows

2006-03-23 Thread Raghavendra Prabhu
Check Whether it has got anything to do with UTF
There is a new line difference between windows and linux

Rgds
Prabhu


On 3/24/06, Chris Cain [EMAIL PROTECTED] wrote:

 No that doesnt seem to be the problem.

 Anyone have any other ideas?

 On Tue, 21 Mar 2006 [EMAIL PROTECTED]

 I had a problem in the past with security on the folder where your index
 is located...but your error does not seem to show that ... I would check
 anyway though...

 -Original Message-
 From: Chris Cain cbc20[at]hermes.cam.ac.uk
 To: java-user[at]lucene.apache.org
 Sent: Tue, 21 Mar 2006 15:33:26 + (GMT)
 Subject: Read past EOF error in Windows


 Hi all,

 I wrote a lucene program which runs fine under Linux and Mac but fails on
 most Windows machines. (I have managed to get it to work on one version of
 XP however)

 Specifically when i open or search the index i get the following error
 message.

 Any help would be appreciated,
 Cheers,
 Chris

 caught a class java.io.IOException
 with message: read past EOF
 java.io.IOException: read past EOF
 at org.apache.lucene.store.FSIndexInput.readInternal(FSDirectory.java:451)
 at
 org.apache.lucene.store.BufferedIndexInput.readBytes(
 BufferedIndexInput.java:45)
 at
 org.apache.lucene.index.CompoundFileReader$CSIndexInput.readInternal(
 CompoundFileReader.java:219)
 at
 org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java
 :64)
 at
 org.apache.lucene.store.BufferedIndexInput.readByte(
 BufferedIndexInput.java:33)
 at org.apache.lucene.store.IndexInput.readInt(IndexInput.java:46)
 at org.apache.lucene.index.SegmentTermEnum.init(SegmentTermEnum.java:47)
 at org.apache.lucene.index.TermInfosReader.init(TermInfosReader.java:48)
 at
 org.apache.lucene.index.SegmentReader.initialize(SegmentReader.java:147)
 at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:129)
 at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:115)
 at org.apache.lucene.index.IndexReader$1.doBody(IndexReader.java:150)
 at org.apache.lucene.store.Lock$With.run(Lock.java:109)
 at org.apache.lucene.index.IndexReader.open(IndexReader.java:143)
 at org.apache.lucene.index.IndexReader.open(IndexReader.java:127)
 at org.apache.lucene.search.IndexSearcher.init(IndexSearcher.java:42)


 -
 To unsubscribe, e-mail: java-user-unsubscribe[at]lucene.apache.org
 For additional commands, e-mail: java-user-help[at]lucene.apache.org


 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]




Re: lucene NFS support

2006-03-23 Thread Doug Cutting

Dai, Chunhe wrote:

Does anyone know whether Lucene plans to support NFS in later
release(2.0)? We are planning to integrate Lucene into our products and
cluster support is definitely needed. We want to check whether NFS
support is in the plan or not before implementing a new file locking
ourselves with it.


I think that nio-based locking would probably fix this, and could easily 
be provided in addition or in place of the existing locking mechanism. 
I think the last time this was considered Lucene was still attempting to 
be compatible with Java 1.3.  But I think Lucene 2.0 is aimed at Java 1.4.


Doug

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Native code compilation

2006-03-23 Thread Seeta Somagani
Yeah, I'm too lazy to write the code again in C++. Was just trying to
see if compiling to native code works. Thanks
Seeta

-Original Message-
From: Otis Gospodnetic [mailto:[EMAIL PROTECTED] 
Sent: Thursday, March 23, 2006 5:44 PM
To: java-user@lucene.apache.org
Subject: Re: Native code compilation

Native code
There is a C++ port called CLucene, if that suits you more than coffee
beans...

Otis

- Original Message 
From: Seeta Somagani [EMAIL PROTECTED]
To: java-user@lucene.apache.org
Sent: Thursday, March 23, 2006 4:47:33 PM
Subject: Native code compilation


Hi all,

Has anyone tried to compile their Lucene applications into native code?
Mine works fine in a VM but the call to search() on IndexSearcher is
crashing the application, after I compile it into native code. There is
apparently no problem in instantiating an IndexSearcher though. I tried
this on both Linux and Windows and am getting the same problem.
Thnx
Seeta


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]





-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]