Re: IndexReader.deleteDocuments
The javadoc is right. :) Otis - Original Message From: EDMOND KEMOKAI [EMAIL PROTECTED] To: java-user@lucene.apache.org Sent: Sunday, October 15, 2006 12:49:21 AM Subject: IndexReader.deleteDocuments Hi guys, I am a newbee so excuse me if this is a repost. From the javadoc it seems Reader.deleteDocuments deletes only documents that have the provided term, but the implementation examples that I have seen and from the behaviour of my own app, deleteDocuments(term) deletes documents that don't have the given term. Can someone clarify this for me? Thanks Edmond Kemokai. talk trash and carry a small stick. PAUL KRUGMAN (NYT) - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: problem deleting documents
now pk is primary key which i am storing but not indexing it.. doc.add(new Field(pk, message.getId().toString(),Field.Store.YES, Field.Index.NO)); You would need to index it for this to work. From javadocs for IndexReader.deleteDocuments(Term): Deletes all documents _containing_ term Containment relates to indexed terms. when i am making a search i can get pk and show it in result...but above code is not deleting the document - Doron - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: IndexReader.deleteDocuments
Thanks for the response Otis, below is a link to the javadoc in the API: http://lucene.apache.org/java/docs/api/org/apache/lucene/demo/DeleteFiles.html ( Deletes documents from an index that do not contain a term) Here is a link to the actual sample implementation: http://svn.apache.org/repos/asf/lucene/java/trunk/src/demo/org/apache/lucene/demo/DeleteFiles.java In the file above you have code that looks like this: Term term = new Term(path, args[0]); int deleted = reader.deleteDocuments(term); So in effect it should delete documents that don't contain the path value correspoding to what's in args[0]. Except the API documentation suggests the opposite. In other words the above code should delete only documents containing path values equal to args[0] (this is obviously more intuitive). Here is the API doc for what the above code snippet should do: ( http://lucene.apache.org/java/docs/api/org/apache/lucene/index/IndexReader.html#deleteDocuments(org.apache.lucene.index.Term) ): Deletes all documents containing term. This is useful if one uses a document field to hold a unique ID string for the document. Then to delete such a document, one merely constructs a term with the appropriate field and the unique ID string as its text and passes it to this method. See deleteDocument(int)http://lucene.apache.org/java/docs/api/org/apache/lucene/index/IndexReader.html#deleteDocument%28int%29for information about when this deletion will become effective. From observation in my app, it is deleting documents that don't have the provided term, which means there's no easy way to delete a doc (other than iterating) even if you have a unique id. On 10/15/06, Otis Gospodnetic [EMAIL PROTECTED] wrote: The javadoc is right. :) Otis - Original Message From: EDMOND KEMOKAI [EMAIL PROTECTED] To: java-user@lucene.apache.org Sent: Sunday, October 15, 2006 12:49:21 AM Subject: IndexReader.deleteDocuments Hi guys, I am a newbee so excuse me if this is a repost. From the javadoc it seems Reader.deleteDocuments deletes only documents that have the provided term, but the implementation examples that I have seen and from the behaviour of my own app, deleteDocuments(term) deletes documents that don't have the given term. Can someone clarify this for me? Thanks Edmond Kemokai. talk trash and carry a small stick. PAUL KRUGMAN (NYT) - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] -- talk trash and carry a small stick. PAUL KRUGMAN (NYT)
Re: Lucene 2.0.1 release date
I would very much like to see the .NET port in line with lucene java This would result in index compatibility and equivalent features as that lucene provides George - Cheers for the continuous effort to keep lucene.net in line with Lucene Regards, Prabhu On 10/14/06, Otis Gospodnetic [EMAIL PROTECTED] wrote: I'd have to check CHANGES.txt, but I don't think that many bugs have been fixed and not that many new features added that anyone is itching for a new release. Otis - Original Message oFrom: George Aroush [EMAIL PROTECTED] To: java-dev@lucene.apache.org; java-user@lucene.apache.org Sent: Saturday, October 14, 2006 10:32:47 AM Subject: RE: Lucene 2.0.1 release date Hi folks, Sorry for reposting this question (see original email below) and this time to both mailing list. If anyone can tell me what is the plan for Lucene 2.0.1 release, I would appreciate it very much. As some of you may know, I am the porter of Lucene to Lucene.Net knowing when 2.0.1 will be released will help me plan things out. Regards, -- George Aroush -Original Message- From: George Aroush [mailto:[EMAIL PROTECTED] Sent: Thursday, October 12, 2006 12:07 AM To: java-dev@lucene.apache.org Subject: Lucene 2.0.1 release date Hi folks, What's the plan for Lucene 2.0.1 release date? Thanks! -- George Aroush - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: problem deleting documents
thanks, it worked On 10/15/06, Doron Cohen [EMAIL PROTECTED] wrote: now pk is primary key which i am storing but not indexing it.. doc.add(new Field(pk, message.getId().toString(),Field.Store.YES, Field.Index.NO)); You would need to index it for this to work. From javadocs for IndexReader.deleteDocuments(Term): Deletes all documents _containing_ term Containment relates to indexed terms. when i am making a search i can get pk and show it in result...but above code is not deleting the document - Doron - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
java.io.IOException: read past EOF
I am trying to write an Ejb3Directory. It seems to work for index writing but not for searching. I get the EOF exception. I assume this means that either my OutputStream or InputStream is doing something wrong. It fails because the CSInputStream has a length of zero when it reads the .fnm section of the .cfs file. Does anyone have any suggestions? Thanks! Here is more background info: - Using version 1.4.3 - Stack trace java.io.IOException: read past EOF at org.apache.lucene.store.InputStream.refill(InputStream.java:154) at org.apache.lucene.store.InputStream.readByte(InputStream.java:43) at org.apache.lucene.store.InputStream.readVInt(InputStream.java:83) at org.apache.lucene.index.FieldInfos.read(FieldInfos.java:195) at org.apache.lucene.index.FieldInfos.init(FieldInfos.java:55) at org.apache.lucene.index.SegmentReader.initialize(SegmentReader.java:109) at org.apache.lucene.index.SegmentReader.init(SegmentReader.java:89) at org.apache.lucene.index.IndexReader$1.doBody(IndexReader.java:118) at org.apache.lucene.store.Lock$With.run(Lock.java:109) at org.apache.lucene.index.IndexReader.open(IndexReader.java:111) at org.apache.lucene.index.IndexReader.open(IndexReader.java:106) at org.apache.lucene.search.IndexSearcher.init(IndexSearcher.java:43) - Entity Bean @Entity public class IndexBean implements Serializable { @Id private String name; @Lob private byte[] data; @Version private Calendar timestamp; ... } - InputStream public class Ejb3InputStream extends InputStream { private java.io.InputStream is; public Ejb3InputStream(IndexBean bean) { this.is = new ByteArrayInputStream(bean.getData()); length = bean.getData().length; } public void close() throws IOException { is.close(); } protected void readInternal(byte[] b, int off, int len) throws IOException { is.read(b, off, len); } protected void seekInternal(long n) throws IOException { is.skip(n); } } - OutputStream public class Ejb3OutputStream extends OutputStream { private IndexBean bean; private ByteArrayOutputStream os = new ByteArrayOutputStream(); public Ejb3OutputStream(IndexBean bean) { this.bean = bean; } protected void flushBuffer(byte[] b, int len) throws IOException { os.write(b); } public long length() throws IOException { return os.size(); } public final void close() throws IOException { super.close(); bean.setData(os.toByteArray()); } }
Re: QueryParser Is Badly Broken
Mark, you wrote: On another note...http://famestalker.com ... http://famestalker.com/devwiki/ Could you explain how Paragraph/Sentence Proximity Searching is implemented in Qsol? Regards, Paul Elschot - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: problem deleting documents
Ismail, I was having the same type of problem (using v2) until I changed my index to change the ID field from Field.Index.TOKENIZED to Field.Index.UN_TOKENIZED. Can you try that, or create a secondary field that is set up that way with your pk id in it? Chris Ismail Siddiqui [EMAIL PROTECTED] 10/15/2006 01:58 AM Please respond to java-user@lucene.apache.org To java-user@lucene.apache.org cc Subject problem deleting documents hi guys i am having problem deleting documents .. apparently its not doin it.. here is the code snippet public void delete(final BoardMessage message) { try{ IndexReader fsReader; if (index.exists()) { fsReader =IndexReader.open(index); fsReader.deleteDocuments(new Term(pk,message.getId()+)); fsReader.close(); } } catch(IOException e){ e.printStackTrace(); } now pk is primary key which i am storing but not indexing it.. doc.add(new Field(pk, message.getId().toString(),Field.Store.YES, Field.Index.NO)); when i am making a search i can get pk and show it in result...but above code is not deleting the document
Re: QueryParser Is Badly Broken
In a way that certainly needs more testing (haven't had the time), but here is the gist: I modified the SpanNotQuery to allow a certain number of span crossings-- making it something of a WithinSpanQuery. So instead of just being able to say find something and something else and don't let it span a paragraph marker span, you can say find this and it can span up to to 3 paragraph marker spans. I then made a special standard analyzer that uses a standard sentence recognizer regex to inject sentence marker tokens. Paragraphs seem less detectable, so right now the analyzer just looks for the paragraph symbol...perhaps a double newline might be better though. I still have not worked out the best para/sent token markers to put in the index or the best way to mark paragraphs in the input text. I also would like to make it so that a paragraph marker also works as a sentence marker so that they do not need to be doubled up. - Mark On 10/15/06, Paul Elschot [EMAIL PROTECTED] wrote: Mark, you wrote: On another note...http://famestalker.com ... http://famestalker.com/devwiki/ Could you explain how Paragraph/Sentence Proximity Searching is implemented in Qsol? Regards, Paul Elschot - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Looking for a stemmer that can return all inflected forms
All: Thanks for the ideas and suggestions. Bill: As Otis pointed out, Lucene already comes with a couple of stemmers (I'm using Lucene 2.0). Besides PorterStemFilter, you can also take a look at SnowballAnalyzer and SnowballFilter classes which support more than just English. The integration is pretty straightforward. /Jong -Original Message- From: Otis Gospodnetic [mailto:[EMAIL PROTECTED] Sent: Sunday, October 15, 2006 12:38 AM To: java-user@lucene.apache.org Subject: Re: Looking for a stemmer that can return all inflected forms Bill: Lucene already comes with PorterStemFilter (class name), which you can use for English. Ideas 1 and 2 sound interesting, but I think they may end up offering false positives. The reason is obvious - multiple and unrelated words can get stemmed to the same stem. Is care really the stem for caring? Maybe. But imagine the stem is car. Suddenly the word cars shares the same car stem and you have a false positive. Jong: I _think_ what you need is a reverse lemmatizer. Otis - Original Message From: Bill Taylor [EMAIL PROTECTED] To: java-user@lucene.apache.org Cc: Jong Kim [EMAIL PROTECTED] Sent: Saturday, October 14, 2006 11:43:10 PM Subject: Re: Looking for a stemmer that can return all inflected forms On Oct 14, 2006, at 3:57 PM, Jong Kim wrote: Hi, I'm looking for a stemmer that is capable of returning all morphological variants of a query term (to be used for high-recall search). For example, given a query term of 'cares', I would like to be able to generate 'cares', 'care', 'cared', and 'caring'. I looked at the Porter stemmer, Snowball stemmer, and the K-stem. All of them provide a method that takes a surface string ('cares') as an input and returns its base form/stem, which is 'care' in this example. First of all, I would GREATLY appreciate it if you would tell me which of these is easiest to incorporate into Lucene. I have the same problem you do. I have solved the other end of it but do not knot how to fit a stemmer into Lucene. But it appears that I can not use the stemmer to generate all of the inflected forms of a given query term. Does anyone know of such tool for Lucene? I am writing one which is VERY SPECIAL PURPOSE and therefore my code not likely to be of much use to you. HOWEVER, the basic idea is quite simple: Idea 1: 1) Since you have to use the stemmer against something, you are reading words out of the index and extracting their stems. 2) Having done that for a word, find all nearby words which have the same stem. The simplest definition of nearby that I can think of is that the word starts with the stem, but you might want to drop the last character of the stem and look for all words that start with that. Thus, if the stem is care you would look at all words that start with car and if they have care as the stem, they are in the same family. The advantage of this approach is that you do not ever offer any words that are not in your index. If you found cares and cared but not caring in your index, you would not want to suggest that someone search for caring because they won't find it. So you use the index as the source of words to stem. Idea 2: Another way to do it is to build a hash map of tree sets keyed to the stem. Each stem has a tree set of all words which have it as a stem. The code would look something like HashMapString, TreeSet stemmedWords = new HashMapString, TreeSet(); TreeSetString wordsForStem; for (String word : all words in the index) { stem = MagicStemmer(word); // I left out code for words that do not have stems if ( (wordsForStem = stemmedWords.get(stem)) == null) { wordsForStem = new TreeSetString(); // Tree set for the new stem stemmedWords.put(stem, wordsForStem); // Now this stem has a set for its words } wordsForStem.add(word); // Put the word into the tree set for its stem } For each stem from all the words in your index, you get a tree set which contains all the words which have it as a stem; The tree set keeps its words in alphabetical order. If you want the stems to be displayed in alphabetical order, use a TreeMap instead of a HashMap. Any help or pointer would be greatly appreciated. I would appreciate your telling me which stemmer for English words is easiest to incorporate into Lucene and where to find it. Thanks. Bill Taylor - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
serious Lazy Field bug
If anyone is using the new lazy field loading feature from the Lucene trunk, you should turn it off or upgrade to the next nightly build (lucene-2006-10-16) or later. Bug details here: http://issues.apache.org/jira/browse/LUCENE-683 -Yonik http://incubator.apache.org/solr Solr, the open-source Lucene search server - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Lucene 2.0.1 release date
Thanks for the reply Otis. I looked at the CHANGES.txt file and saw quit a bit of changes. For my port from Java to C#, I can't rely on the trunk code as it is (to my knowledge) changes on a monthly basic if not weekly. What I need is an official release so that I can use it as the port point. Regards, -- George Aroush -Original Message- From: Otis Gospodnetic [mailto:[EMAIL PROTECTED] Sent: Sunday, October 15, 2006 12:41 AM To: java-user@lucene.apache.org Subject: Re: Lucene 2.0.1 release date I'd have to check CHANGES.txt, but I don't think that many bugs have been fixed and not that many new features added that anyone is itching for a new release. Otis - Original Message From: George Aroush [EMAIL PROTECTED] To: java-dev@lucene.apache.org; java-user@lucene.apache.org Sent: Saturday, October 14, 2006 10:32:47 AM Subject: RE: Lucene 2.0.1 release date Hi folks, Sorry for reposting this question (see original email below) and this time to both mailing list. If anyone can tell me what is the plan for Lucene 2.0.1 release, I would appreciate it very much. As some of you may know, I am the porter of Lucene to Lucene.Net knowing when 2.0.1 will be released will help me plan things out. Regards, -- George Aroush -Original Message- From: George Aroush [mailto:[EMAIL PROTECTED] Sent: Thursday, October 12, 2006 12:07 AM To: java-dev@lucene.apache.org Subject: Lucene 2.0.1 release date Hi folks, What's the plan for Lucene 2.0.1 release date? Thanks! -- George Aroush - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Avoiding sort by date
On 10/12/06, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote: Does the Sort function create some kind of internal cache? Yes, it's called the FieldCache, and there is a cache with a weak reference to the index reader as a key. As long as there is a reference to the index reader (even after close() has been called) the cache data will exist. -Yonik http://incubator.apache.org/solr Solr, the open-source Lucene search server Observing the heap, it seems that a full garbage collection after calling IndexSearcher.close() still leaves a lot of memory occupied. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: IndexReader.deleteDocuments
Can somebody please clarify the intended behaviour of IndexReader.deleteDocuments()?, between the various documentations and implementations it seems this function is broken. API doc says it should delete docs containing the provided term but instead it deletes all documents not containg the given term. On 10/15/06, EDMOND KEMOKAI [EMAIL PROTECTED] wrote: Thanks for the response Otis, below is a link to the javadoc in the API: http://lucene.apache.org/java/docs/api/org/apache/lucene/demo/DeleteFiles.html ( Deletes documents from an index that do not contain a term) Here is a link to the actual sample implementation: http://svn.apache.org/repos/asf/lucene/java/trunk/src/demo/org/apache/lucene/demo/DeleteFiles.java In the file above you have code that looks like this: Term term = new Term(path, args[0]); int deleted = reader.deleteDocuments(term); So in effect it should delete documents that don't contain the path value correspoding to what's in args[0]. Except the API documentation suggests the opposite. In other words the above code should delete only documents containing path values equal to args[0] (this is obviously more intuitive). Here is the API doc for what the above code snippet should do: (http://lucene.apache.org/java/docs/api/org/apache/lucene/index/IndexReader.html#deleteDocuments(org.apache.lucene.index.Term) http://lucene.apache.org/java/docs/api/org/apache/lucene/index/IndexReader.html#deleteDocuments%28org.apache.lucene.index.Term%29 ): Deletes all documents containing term. This is useful if one uses a document field to hold a unique ID string for the document. Then to delete such a document, one merely constructs a term with the appropriate field and the unique ID string as its text and passes it to this method. See deleteDocument(int)http://lucene.apache.org/java/docs/api/org/apache/lucene/index/IndexReader.html#deleteDocument%28int%29for information about when this deletion will become effective. From observation in my app, it is deleting documents that don't have the provided term, which means there's no easy way to delete a doc (other than iterating) even if you have a unique id. On 10/15/06, Otis Gospodnetic [EMAIL PROTECTED] wrote: The javadoc is right. :) Otis - Original Message From: EDMOND KEMOKAI [EMAIL PROTECTED] To: java-user@lucene.apache.org Sent: Sunday, October 15, 2006 12:49:21 AM Subject: IndexReader.deleteDocuments Hi guys, I am a newbee so excuse me if this is a repost. From the javadoc it seems Reader.deleteDocuments deletes only documents that have the provided term, but the implementation examples that I have seen and from the behaviour of my own app, deleteDocuments(term) deletes documents that don't have the given term. Can someone clarify this for me? Thanks Edmond Kemokai. talk trash and carry a small stick. PAUL KRUGMAN (NYT) - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] -- talk trash and carry a small stick. PAUL KRUGMAN (NYT) -- * Still searching for the gatekeeper to the Valence-Band, let me out of here! * When I was coming up, it was a dangerous world, and you knew exactly who they were. It was us versus them, and it was clear who them was. Today, we are not so sure who the they are, but we know they're there. Poet Laureate G.W Bush (I am not a Bush basher by the way) talk trash and carry a small stick. PAUL KRUGMAN (NYT)
Re: IndexReader.deleteDocuments
On 10/16/06, EDMOND KEMOKAI [EMAIL PROTECTED] wrote: Can somebody please clarify the intended behaviour of IndexReader.deleteDocuments()? It deletes documents containing the term. The API docs are correct, the demo docs are incorrect if they say otherwise. -Yonik http://incubator.apache.org/solr Solr, the open-source Lucene search server - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Query not finding indexed data
Hi Antony, you cannot instruct the query parser to do that. Note that an application can add both tokenized and un_tokenized data under the same field name. This is an application logic to know that a certain query is not to be tokenized. In this case you could create your query with: query = new TermQuery(fieldName, IqTstAdminGuide2.pdf); Hope this helps, Doron Antony Bowesman [EMAIL PROTECTED] wrote on 15/10/2006 20:08:37: Hi, I have a field attname that is indexed with Field.Store.YES, Field.Index.UN_TOKENIZED. I have a document with the attname of IqTstAdminGuide2.pdf. QueryParser parser = new QueryParser(body, new StandardAnalyzer()); Query query = parser.parse(attname:IqTstAdminGuide2.pdf); fails to find the Document, which I guess is because of StandardAnalyzer lowercasing the filename. How can one instruct the QueryParser only to use the Analyzer to analyse fields in an expression that were tokenized during the indexing process and to not analyse those that were UN_TOKENIZED? Regards Antony - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Error while closing IndexWriter
Hi, Sorry Doron, if the code added in my last mail was confusing and thanks for the reply. The code added in my last mail was not exactly the version that was causing problem, this one is. The lucene version is 1.2. Waiting for a suggestion. Code: public void indexFile(File indexDirFile, File resumeFile) throws IOException { IndexWriter indexwriter = null; try { File afile[] = indexDirFile.listFiles(); boolean flag = false; if (afile.length = 0) flag = true; indexwriter = new IndexWriter(indexDirFile, new StandardAnalyzer(), flag); doIndexing(indexwriter, resumeFile); // following method if (indexwriter != null) { indexwriter.close(); // --Indexer.java:150 (Error here) } } catch (IOException e) { e.printStackTrace(); throw new Error(e); } } //-- --// public void doIndexing(IndexWriter indexwriter, File resumeFile) { Document document = new Document(); if (resumeFile.getName().endsWith(.pdf)) { ... // Code for indexing PDF docs. Right now the inputs are not PDF docs, // so I have removed this piece as it could not have been causing problems. } else { try { document.add(Field.Text(IndexerColumns.contents, new FileReader(resumeFile))); } catch (FileNotFoundException e) { e.printStackTrace(); throw new MyRuntimeException(e.getMessage(), e); } } for (int i = 0; i this.columnInfos.length; i++) { ColumnInfo columnInfo = columnInfos[i]; String value = String.valueOf(mapLuceneParams.get(columnInfo.columnName)); if (value != null) { value = value.trim(); if (value.length() != 0) { document.add(Field.Text(columnInfo.columnName, value)); } } } try { indexwriter.addDocument(document); } catch (IOException e) { e.printStackTrace(); throw new MyRuntimeException(e.getMessage(), e); } } } Regards, Shivani Sawhney NetEdge Computing Global Services Private Limited A-14, Sector-7, NOIDA U.P. 201-301 Tel # 91-120-2423281, 2423282 Fax # 91-120-2423279 www.netedgecomputing.com http://www.netedgecomputing.com/ *** Disclaimer: This message may contain confidential and/or privileged information. If you are not the addressee or authorized to receive this for the addressee, you must not use, copy, disclose or take any action based on this message or any information herein. If you have received this message in error, please advise the sender immediately by reply e-mail and delete this message. Thank you for your cooperation.-Original Message- From: Doron Cohen [mailto:[EMAIL PROTECTED] Sent: Friday, October 13, 2006 12:17 PM To: java-user@lucene.apache.org Subject: Re: Error while closing IndexWriter I am far from perfect in this pdf text extracting, however I noticed something in your code that you may want to check to clear up the reason for this failure, see below.. Shivani Sawhney [EMAIL PROTECTED] wrote on 12/10/2006 22:54:07: Hi All, I am facing a peculiar problem. I am trying to index a file and the indexing code executes without any error but when I try to close the indexer, I get the following error and the error comes very rarely but when it does, no code on document indexing works and I finally have to delete all indexes and run a re-indexing utility. Can anyone please suggest what might be the problem? Stack Trace: java.lang.ArrayIndexOutOfBoundsException: 97 = 17 at java.util.Vector.elementAt(Vector.java:432) at org.apache.lucene.index.FieldInfos.fieldInfo(FieldInfos.java:135) at org.apache.lucene.index.FieldsReader.doc(FieldsReader.java:103) at org.apache.lucene.index.SegmentReader.document(SegmentReader.java:237) at org.apache.lucene.index.SegmentMerger.mergeFields(SegmentMerger.java:169) at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:97) at org.apache.lucene.index.IndexWriter.mergeSegments(IndexWriter.java:425) at