Re: IndexReader.deleteDocuments

2006-10-15 Thread Otis Gospodnetic
The javadoc is right. :)

Otis

- Original Message 
From: EDMOND KEMOKAI [EMAIL PROTECTED]
To: java-user@lucene.apache.org
Sent: Sunday, October 15, 2006 12:49:21 AM
Subject: IndexReader.deleteDocuments

Hi guys,
I am a newbee so excuse me if this is a repost. From the javadoc it seems
Reader.deleteDocuments deletes only documents that have the provided term,
but the implementation examples that I have seen and from the behaviour of
my own app, deleteDocuments(term) deletes documents that don't have the
given term. Can someone clarify this for me?

Thanks
Edmond Kemokai.


talk trash and carry a small stick.
PAUL KRUGMAN (NYT)




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: problem deleting documents

2006-10-15 Thread Doron Cohen
 now pk is primary key which i am storing but not indexing it..
  doc.add(new Field(pk, message.getId().toString(),Field.Store.YES,
 Field.Index.NO));

You would need to index it for this to work.
From javadocs for IndexReader.deleteDocuments(Term):
  Deletes all documents _containing_ term
Containment relates to indexed terms.


 when i am making a search i can get pk and show it in result...but above
 code is not deleting the document

- Doron


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: IndexReader.deleteDocuments

2006-10-15 Thread EDMOND KEMOKAI

Thanks for the response Otis, below is a link to the javadoc in the API:

http://lucene.apache.org/java/docs/api/org/apache/lucene/demo/DeleteFiles.html
( Deletes documents from an index that do not contain a term)

Here is a link to the actual sample implementation:
http://svn.apache.org/repos/asf/lucene/java/trunk/src/demo/org/apache/lucene/demo/DeleteFiles.java

In the file above you have code that looks like this:

 Term term = new Term(path, args[0]);
 int deleted = reader.deleteDocuments(term);

So in effect it should delete documents that don't contain the path value
correspoding to what's in args[0]. Except the API documentation suggests the
opposite. In other words the above code should delete only documents
containing path values equal to args[0] (this is obviously more
intuitive). Here is the API doc for what the above code snippet should do:

(
http://lucene.apache.org/java/docs/api/org/apache/lucene/index/IndexReader.html#deleteDocuments(org.apache.lucene.index.Term)
):

Deletes all documents containing term. This is useful if one uses a document
field to hold a unique ID string for the document. Then to delete such a
document, one merely constructs a term with the appropriate field and the
unique ID string as its text and passes it to this method. See
deleteDocument(int)http://lucene.apache.org/java/docs/api/org/apache/lucene/index/IndexReader.html#deleteDocument%28int%29for
information about when this deletion will become effective.




From observation in my app, it is deleting documents that don't have the

provided term, which means there's no easy way to delete a doc (other than
iterating) even if you have a unique id.

On 10/15/06, Otis Gospodnetic [EMAIL PROTECTED] wrote:


The javadoc is right. :)

Otis

- Original Message 
From: EDMOND KEMOKAI [EMAIL PROTECTED]
To: java-user@lucene.apache.org
Sent: Sunday, October 15, 2006 12:49:21 AM
Subject: IndexReader.deleteDocuments

Hi guys,
I am a newbee so excuse me if this is a repost. From the javadoc it seems
Reader.deleteDocuments deletes only documents that have the provided term,
but the implementation examples that I have seen and from the behaviour of
my own app, deleteDocuments(term) deletes documents that don't have the
given term. Can someone clarify this for me?

Thanks
Edmond Kemokai.


talk trash and carry a small stick.
PAUL KRUGMAN (NYT)




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]





--
talk trash and carry a small stick.
PAUL KRUGMAN (NYT)


Re: Lucene 2.0.1 release date

2006-10-15 Thread Raghavendra Prabhu

I would very much like to see the .NET port in line with lucene java
This would result in index compatibility and equivalent features as that
lucene provides

George - Cheers for the continuous effort to keep lucene.net in line with
Lucene

Regards,
Prabhu




On 10/14/06, Otis Gospodnetic [EMAIL PROTECTED] wrote:


I'd have to check CHANGES.txt, but I don't think that many bugs have been
fixed and not that many new features added that anyone is itching for a new
release.

Otis

- Original Message 
oFrom: George Aroush [EMAIL PROTECTED]
To: java-dev@lucene.apache.org; java-user@lucene.apache.org
Sent: Saturday, October 14, 2006 10:32:47 AM
Subject: RE: Lucene 2.0.1 release date

Hi folks,

Sorry for reposting this question (see original email below) and this time
to both mailing list.

If anyone can tell me what is the plan for Lucene 2.0.1 release, I would
appreciate it very much.

As some of you may know, I am the porter of Lucene to Lucene.Net knowing
when 2.0.1 will be released will help me plan things out.

Regards,

-- George Aroush


-Original Message-
From: George Aroush [mailto:[EMAIL PROTECTED]
Sent: Thursday, October 12, 2006 12:07 AM
To: java-dev@lucene.apache.org
Subject: Lucene 2.0.1 release date

Hi folks,

What's the plan for Lucene 2.0.1 release date?

Thanks!

-- George Aroush


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]





-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




Re: problem deleting documents

2006-10-15 Thread Ismail Siddiqui

thanks, it worked

On 10/15/06, Doron Cohen [EMAIL PROTECTED] wrote:


 now pk is primary key which i am storing but not indexing it..
  doc.add(new Field(pk, message.getId().toString(),Field.Store.YES,
 Field.Index.NO));

You would need to index it for this to work.
From javadocs for IndexReader.deleteDocuments(Term):
Deletes all documents _containing_ term
Containment relates to indexed terms.


 when i am making a search i can get pk and show it in result...but above
 code is not deleting the document

- Doron


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




java.io.IOException: read past EOF

2006-10-15 Thread John Gilbert
I am trying to write an Ejb3Directory. It seems to work for index writing but 
not for searching.
I get the EOF exception. I assume this means that either my OutputStream or 
InputStream is doing
something wrong. It fails because the CSInputStream has a length of zero when 
it reads the .fnm section 
of the .cfs file.

Does anyone have any suggestions? 
Thanks!

Here is more background info:

- Using version 1.4.3
- Stack trace
java.io.IOException: read past EOF
at org.apache.lucene.store.InputStream.refill(InputStream.java:154)
at org.apache.lucene.store.InputStream.readByte(InputStream.java:43)
at org.apache.lucene.store.InputStream.readVInt(InputStream.java:83)
at org.apache.lucene.index.FieldInfos.read(FieldInfos.java:195)
at org.apache.lucene.index.FieldInfos.init(FieldInfos.java:55)
at org.apache.lucene.index.SegmentReader.initialize(SegmentReader.java:109)
at org.apache.lucene.index.SegmentReader.init(SegmentReader.java:89)
at org.apache.lucene.index.IndexReader$1.doBody(IndexReader.java:118)
at org.apache.lucene.store.Lock$With.run(Lock.java:109)
at org.apache.lucene.index.IndexReader.open(IndexReader.java:111)
at org.apache.lucene.index.IndexReader.open(IndexReader.java:106)
at org.apache.lucene.search.IndexSearcher.init(IndexSearcher.java:43)


- Entity Bean

@Entity
public class IndexBean implements Serializable {
@Id private String name;
@Lob private byte[] data;
@Version private Calendar timestamp;
...
}

- InputStream

public class Ejb3InputStream extends InputStream {
private java.io.InputStream is;

public Ejb3InputStream(IndexBean bean) {
this.is = new ByteArrayInputStream(bean.getData());
length = bean.getData().length;
}

public void close() throws IOException {
is.close();
}

protected void readInternal(byte[] b, int off, int len) throws IOException {
is.read(b, off, len);
}

protected void seekInternal(long n) throws IOException {
is.skip(n);
}
}

- OutputStream

public class Ejb3OutputStream extends OutputStream {
private IndexBean bean;
private ByteArrayOutputStream os = new ByteArrayOutputStream();

public Ejb3OutputStream(IndexBean bean) {
this.bean = bean;
}

protected void flushBuffer(byte[] b, int len) throws IOException {
os.write(b);
}

public long length() throws IOException {
return os.size();
}

public final void close() throws IOException {
super.close();
bean.setData(os.toByteArray());
}
}










Re: QueryParser Is Badly Broken

2006-10-15 Thread Paul Elschot
Mark,

you wrote:
  On another note...http://famestalker.com
 
...
 
  http://famestalker.com/devwiki/

Could you explain how Paragraph/Sentence Proximity Searching
is implemented in Qsol?

Regards,
Paul Elschot

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: problem deleting documents

2006-10-15 Thread cfowler
Ismail,

I was having the same type of problem (using v2) until I changed 
my index to change the ID field from Field.Index.TOKENIZED to 
Field.Index.UN_TOKENIZED. Can you try that, or create a secondary field 
that is set up that way with your pk id in it?

Chris



Ismail Siddiqui [EMAIL PROTECTED] 
10/15/2006 01:58 AM
Please respond to
java-user@lucene.apache.org


To
java-user@lucene.apache.org
cc

Subject
problem deleting documents






hi guys
i am having problem deleting documents .. apparently its not doin it.. 
here
is the code snippet

 public void delete(final BoardMessage message)
 {
try{

   IndexReader fsReader;

   if  (index.exists()) {
  fsReader  =IndexReader.open(index);
  fsReader.deleteDocuments(new Term(pk,message.getId()+));
  fsReader.close();
   }

}
catch(IOException e){
 e.printStackTrace();
}

now pk is primary key which i am storing but not indexing it..
 doc.add(new Field(pk, message.getId().toString(),Field.Store.YES,
Field.Index.NO));

when i am making a search i can get pk and show it in result...but above
code is not deleting the document



Re: QueryParser Is Badly Broken

2006-10-15 Thread Mark Miller

In a way that certainly needs more testing (haven't had the time), but here
is the gist:

I modified the SpanNotQuery to allow a certain number of span crossings--
making it something of a WithinSpanQuery. So instead of just being able to
say find something and something else and don't let it span a paragraph
marker span, you can say find this and it can span up to to 3 paragraph
marker spans. I then made a special standard analyzer that uses a standard
sentence recognizer regex to inject sentence marker tokens. Paragraphs seem
less detectable, so right now the analyzer just looks for the paragraph
symbol...perhaps a double newline might be better though. I still have not
worked out the best para/sent token markers to put in the index or the best
way to mark paragraphs in the input text. I also would like to make it so
that a paragraph marker also works as a sentence marker so that they do not
need to be doubled up.


- Mark

On 10/15/06, Paul Elschot [EMAIL PROTECTED] wrote:


Mark,

you wrote:
  On another note...http://famestalker.com
 
...

  http://famestalker.com/devwiki/

Could you explain how Paragraph/Sentence Proximity Searching
is implemented in Qsol?

Regards,
Paul Elschot

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




RE: Looking for a stemmer that can return all inflected forms

2006-10-15 Thread Jong Kim
All: Thanks for the ideas and suggestions. 

Bill: As Otis pointed out, Lucene already comes with a couple
of stemmers (I'm using Lucene 2.0). Besides PorterStemFilter,
you can also take a look at SnowballAnalyzer and SnowballFilter
classes which support more than just English. The integration
is pretty straightforward.

/Jong

-Original Message-
From: Otis Gospodnetic [mailto:[EMAIL PROTECTED] 
Sent: Sunday, October 15, 2006 12:38 AM
To: java-user@lucene.apache.org
Subject: Re: Looking for a stemmer that can return all inflected forms

Bill: Lucene already comes with PorterStemFilter (class name), which you can
use for English.

Ideas 1 and 2 sound interesting, but I think they may end up offering false
positives.  The reason is obvious - multiple and unrelated words can get
stemmed to the same stem.
Is care really the stem for caring?  Maybe.  But imagine the stem is
car.  Suddenly the word cars shares the same car stem and you have a
false positive.

Jong: I _think_ what you need is a reverse lemmatizer.

Otis

- Original Message 
From: Bill Taylor [EMAIL PROTECTED]
To: java-user@lucene.apache.org
Cc: Jong Kim [EMAIL PROTECTED]
Sent: Saturday, October 14, 2006 11:43:10 PM
Subject: Re: Looking for a stemmer that can return all inflected forms

On Oct 14, 2006, at 3:57 PM, Jong Kim wrote:

 Hi,

 I'm looking for a stemmer that is capable of returning all 
 morphological variants  of a query term (to be used for high-recall 
 search). For example, given a query term of 'cares', I would like to 
 be able to generate 'cares', 'care', 'cared', and 'caring'.

 I looked at the Porter stemmer, Snowball stemmer, and the K-stem.
 All of them provide a method that takes a surface string ('cares') as 
 an input and returns its base form/stem, which is 'care' in this 
 example.

First of all, I would GREATLY appreciate it if you would tell me which of
these is easiest to incorporate into Lucene.  I have the same problem you
do.  I have solved the other end of it but do not knot how to fit a stemmer
into Lucene.

 But it appears that I can not use the stemmer to generate all of the 
 inflected forms of a given query term.

 Does anyone know of such tool for Lucene?

I am writing one which is VERY SPECIAL PURPOSE and therefore my code not
likely to be of much use to you.  HOWEVER, the basic idea is quite
simple:

Idea 1:

1) Since you have to use the stemmer against something, you are reading
words out of the index and extracting their stems.

2) Having done that for a word, find all nearby words which have the same
stem.  The simplest definition of nearby that I can think of is that the
word starts with the stem, but you might want to drop the last character of
the stem and look for all words that start with that.  
Thus, if the stem is care you would look at all words that start with
car and if they have care as the stem, they are in the same family.

The advantage of this approach is that you do not ever offer any words that
are not in your index.  If you found cares and cared but not caring in your
index, you would not want to suggest that someone search for caring because
they won't find it.  So you use the index as the source of words to stem.

Idea 2:

Another way to do it is to build a hash map of tree sets keyed to the stem.
Each stem has a tree set of all words which have it as a stem.  
The code would look something like

HashMapString, TreeSet stemmedWords = new HashMapString, TreeSet();
TreeSetString wordsForStem;

for (String word : all words in the index)  {
stem = MagicStemmer(word);  // I left out code for words that do not
have stems
   if ( (wordsForStem = stemmedWords.get(stem)) == null) {
   wordsForStem = new TreeSetString();  // Tree set for the
new stem
   stemmedWords.put(stem, wordsForStem); // Now this stem has a
set for its words
  }
 wordsForStem.add(word); // Put the word into the tree set for its
stem }

For each stem from all the words in your index, you get a tree set which
contains all the words which have it as a stem;  The tree set keeps its
words in alphabetical order.

If you want the stems to be displayed in alphabetical order, use a TreeMap
instead of a HashMap.

 Any help or pointer would be greatly appreciated.

I would appreciate your telling me which stemmer for English words is
easiest to incorporate into Lucene and where to find it.  Thanks.

Bill Taylor


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]





-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



serious Lazy Field bug

2006-10-15 Thread Yonik Seeley

If anyone is using the new lazy field loading feature from the Lucene
trunk, you should turn it off or upgrade to the next nightly build
(lucene-2006-10-16) or later.

Bug details here:
http://issues.apache.org/jira/browse/LUCENE-683

-Yonik
http://incubator.apache.org/solr Solr, the open-source Lucene search server

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Lucene 2.0.1 release date

2006-10-15 Thread George Aroush
Thanks for the reply Otis.

I looked at the CHANGES.txt file and saw quit a bit of changes.  For my port
from Java to C#, I can't rely on the trunk code as it is (to my knowledge)
changes on a monthly basic if not weekly.  What I need is an official
release so that I can use it as the port point.

Regards,

-- George Aroush


-Original Message-
From: Otis Gospodnetic [mailto:[EMAIL PROTECTED] 
Sent: Sunday, October 15, 2006 12:41 AM
To: java-user@lucene.apache.org
Subject: Re: Lucene 2.0.1 release date

I'd have to check CHANGES.txt, but I don't think that many bugs have been
fixed and not that many new features added that anyone is itching for a new
release.

Otis

- Original Message 
From: George Aroush [EMAIL PROTECTED]
To: java-dev@lucene.apache.org; java-user@lucene.apache.org
Sent: Saturday, October 14, 2006 10:32:47 AM
Subject: RE: Lucene 2.0.1 release date

Hi folks,

Sorry for reposting this question (see original email below) and this time
to both mailing list.

If anyone can tell me what is the plan for Lucene 2.0.1 release, I would
appreciate it very much.

As some of you may know, I am the porter of Lucene to Lucene.Net knowing
when 2.0.1 will be released will help me plan things out.

Regards,

-- George Aroush


-Original Message-
From: George Aroush [mailto:[EMAIL PROTECTED]
Sent: Thursday, October 12, 2006 12:07 AM
To: java-dev@lucene.apache.org
Subject: Lucene 2.0.1 release date

Hi folks,

What's the plan for Lucene 2.0.1 release date?

Thanks!

-- George Aroush


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]





-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Avoiding sort by date

2006-10-15 Thread Yonik Seeley

On 10/12/06, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote:

Does the Sort function create some kind of internal cache?


Yes, it's called the FieldCache, and there is a cache with a weak
reference to the index reader as a key.  As long as there is a
reference to the index reader (even after close() has been called) the
cache data will exist.

-Yonik
http://incubator.apache.org/solr Solr, the open-source Lucene search server


Observing the heap, it seems that a full garbage collection after calling
IndexSearcher.close() still leaves a lot of memory occupied.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: IndexReader.deleteDocuments

2006-10-15 Thread EDMOND KEMOKAI

Can somebody please clarify the intended behaviour of
IndexReader.deleteDocuments()?, between the various documentations and
implementations it seems this function is broken. API doc says it should
delete docs containing the provided term but instead it deletes all
documents not containg the given term.

On 10/15/06, EDMOND KEMOKAI [EMAIL PROTECTED] wrote:


Thanks for the response Otis, below is a link to the javadoc in the API:

http://lucene.apache.org/java/docs/api/org/apache/lucene/demo/DeleteFiles.html

( Deletes documents from an index that do not contain a term)

Here is a link to the actual sample implementation:

http://svn.apache.org/repos/asf/lucene/java/trunk/src/demo/org/apache/lucene/demo/DeleteFiles.java

In the file above you have code that looks like this:

  Term term = new Term(path, args[0]);

  int deleted = reader.deleteDocuments(term);

So in effect it should delete documents that don't contain the path
value correspoding to what's in args[0]. Except the API documentation
suggests the opposite. In other words the above code should delete only
documents containing path values equal to args[0] (this is obviously more
intuitive). Here is the API doc for what the above code snippet should do:

(http://lucene.apache.org/java/docs/api/org/apache/lucene/index/IndexReader.html#deleteDocuments(org.apache.lucene.index.Term)

http://lucene.apache.org/java/docs/api/org/apache/lucene/index/IndexReader.html#deleteDocuments%28org.apache.lucene.index.Term%29
):

Deletes all documents containing term. This is useful if one uses a
document field to hold a unique ID string for the document. Then to delete
such a document, one merely constructs a term with the appropriate field and
the unique ID string as its text and passes it to this method. See
deleteDocument(int)http://lucene.apache.org/java/docs/api/org/apache/lucene/index/IndexReader.html#deleteDocument%28int%29for
 information about when this deletion will become effective.



From observation in my app, it is deleting documents that don't have the
provided term, which means there's no easy way to delete a doc (other than
iterating) even if you have a unique id.

On 10/15/06, Otis Gospodnetic [EMAIL PROTECTED] wrote:

 The javadoc is right. :)

 Otis

 - Original Message 
 From: EDMOND KEMOKAI [EMAIL PROTECTED]
 To: java-user@lucene.apache.org
 Sent: Sunday, October 15, 2006 12:49:21 AM
 Subject: IndexReader.deleteDocuments

 Hi guys,
 I am a newbee so excuse me if this is a repost. From the javadoc it
 seems
 Reader.deleteDocuments deletes only documents that have the provided
 term,
 but the implementation examples that I have seen and from the behaviour
 of
 my own app, deleteDocuments(term) deletes documents that don't have the
 given term. Can someone clarify this for me?

 Thanks
 Edmond Kemokai.


 talk trash and carry a small stick.
 PAUL KRUGMAN (NYT)




 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]




--
talk trash and carry a small stick.
PAUL KRUGMAN (NYT)





--
   * Still searching for the gatekeeper to the Valence-Band, let me out of
here!

   * When I was coming up, it was a dangerous world, and you knew exactly
who they were. It was us versus them, and it was clear who them was. Today,
we are not so sure who the they are, but we know they're there.

  Poet Laureate G.W Bush (I am not a Bush basher by the way)

talk trash and carry a small stick.
PAUL KRUGMAN (NYT)


Re: IndexReader.deleteDocuments

2006-10-15 Thread Yonik Seeley

On 10/16/06, EDMOND KEMOKAI [EMAIL PROTECTED] wrote:

Can somebody please clarify the intended behaviour of
IndexReader.deleteDocuments()?


It deletes documents containing the term.  The API docs are correct,
the demo docs are incorrect if they say otherwise.

-Yonik
http://incubator.apache.org/solr Solr, the open-source Lucene search server

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Query not finding indexed data

2006-10-15 Thread Doron Cohen
Hi Antony, you cannot instruct the query parser to do that. Note that an
application can add both tokenized and un_tokenized data under the same
field name. This is an application logic to know that a certain query is
not to be tokenized. In this case you could create your query with:
  query = new TermQuery(fieldName, IqTstAdminGuide2.pdf);

Hope this helps,
Doron

Antony Bowesman [EMAIL PROTECTED] wrote on 15/10/2006 20:08:37:
 Hi,

 I have a field attname that is indexed with Field.Store.YES,
 Field.Index.UN_TOKENIZED.  I have a document with the attname of
 IqTstAdminGuide2.pdf.

 QueryParser parser = new QueryParser(body, new StandardAnalyzer());
 Query query = parser.parse(attname:IqTstAdminGuide2.pdf);

 fails to find the Document, which I guess is because of StandardAnalyzer
 lowercasing the filename.

 How can one instruct the QueryParser only to use the Analyzer to
 analyse fields
 in an expression that were tokenized during the indexing process and to
not
 analyse those that were UN_TOKENIZED?

 Regards
 Antony



 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Error while closing IndexWriter

2006-10-15 Thread Shivani Sawhney
Hi,

 

Sorry Doron, if the code added in my last mail was confusing and thanks for
the reply. The code added in my last mail was not exactly the version that
was causing problem, this one is.

The lucene version is 1.2.

 

Waiting for a suggestion.

 

 

 

Code:

public void indexFile(File indexDirFile, File resumeFile) throws IOException

{

 

IndexWriter indexwriter = null;

try

{

File afile[] = indexDirFile.listFiles();

boolean flag = false;

if (afile.length = 0)

flag = true;

indexwriter = new IndexWriter(indexDirFile, new
StandardAnalyzer(), flag);

doIndexing(indexwriter, resumeFile); // following method

   

if (indexwriter != null)

{

indexwriter.close(); // --Indexer.java:150 (Error here)

}

}

catch (IOException e)

{

e.printStackTrace();

throw new Error(e);

}

}

 

//--
--//

public void doIndexing(IndexWriter indexwriter, File resumeFile)

{

 

   

Document document = new Document();

if (resumeFile.getName().endsWith(.pdf))

{

...

// Code for indexing PDF docs. Right now the inputs are not PDF
docs, 

// so I have removed this piece as it could not have been
causing problems.

}

else

{

try

{

document.add(Field.Text(IndexerColumns.contents, new
FileReader(resumeFile)));

}

catch (FileNotFoundException e)

{

e.printStackTrace();

throw new MyRuntimeException(e.getMessage(), e);

}

}

   

for (int i = 0; i  this.columnInfos.length; i++)

{

ColumnInfo columnInfo = columnInfos[i];

String value =
String.valueOf(mapLuceneParams.get(columnInfo.columnName));

 

if (value != null)

{

value = value.trim();

if (value.length() != 0)

{

  document.add(Field.Text(columnInfo.columnName, value));

}

}

}

 

try

{   

indexwriter.addDocument(document);

}

catch (IOException e)

{

e.printStackTrace();

throw new MyRuntimeException(e.getMessage(), e);

}

}

}

 

Regards,

Shivani Sawhney
NetEdge Computing Global Services Private Limited 
A-14, Sector-7, NOIDA U.P. 201-301
Tel #  91-120-2423281, 2423282 
Fax #  91-120-2423279 
www.netedgecomputing.com http://www.netedgecomputing.com/  


***

Disclaimer:

This message may contain confidential and/or privileged information. If you
are not the addressee or authorized to receive this for the addressee, you
must not use, copy, disclose or take any action based on this message or any
information herein. If you have received this message in error, please
advise the sender immediately by reply e-mail and delete this message. Thank
you for your cooperation.-Original Message-
From: Doron Cohen [mailto:[EMAIL PROTECTED] 
Sent: Friday, October 13, 2006 12:17 PM
To: java-user@lucene.apache.org
Subject: Re: Error while closing IndexWriter

 

I am far from perfect in this pdf text extracting, however I noticed

something in your code that you may want to check to clear up the reason

for this failure, see below..

 

Shivani Sawhney [EMAIL PROTECTED] wrote on 12/10/2006

22:54:07:

 Hi All,

 

 I am facing a peculiar problem.

 

 I am trying to index a file and the indexing code executes without any

error

 but when I try to close the indexer, I get the following error and the

error

 comes very rarely but when it does, no code on document indexing works

and I

 finally have to delete all indexes and run a re-indexing utility.

 

 Can anyone please suggest what might be the problem?

 

 Stack Trace:

 

 java.lang.ArrayIndexOutOfBoundsException: 97 = 17

 at java.util.Vector.elementAt(Vector.java:432)

 at

 org.apache.lucene.index.FieldInfos.fieldInfo(FieldInfos.java:135)

 at

 org.apache.lucene.index.FieldsReader.doc(FieldsReader.java:103)

 at

 org.apache.lucene.index.SegmentReader.document(SegmentReader.java:237)

 at

 org.apache.lucene.index.SegmentMerger.mergeFields(SegmentMerger.java:169)

 at

 org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:97)

 at

 org.apache.lucene.index.IndexWriter.mergeSegments(IndexWriter.java:425)

 at