=?iso-8859-1?q?=5BJakarta_Lucene_Wiki=5D_Updated=3A__LuceneFAQ?=

lucene-cvs Mon, 03 Jan 2005 13:38:42 -0800

   Date: 2005-01-03T13:38:31
   Editor: DanielNaber
   Wiki: Jakarta Lucene Wiki
   Page: LuceneFAQ
   URL: http://wiki.apache.org/jakarta-lucene/LuceneFAQ


   avoid useless links

Change Log:

------------------------------------------------------------------------------
@@ -4,7 +4,7 @@
 
 [[TableOfContents]]
 
-== FAQ ==
+== Lucene FAQ ==
 
 === General ===
 
@@ -71,7 +71,7 @@
 
 === Searching ===
 
-==== Why am i getting no hits / incorrect hits? ====
+==== Why am I getting no hits / incorrect hits? ====
 
 Some possible causes:
 
@@ -79,10 +79,10 @@
  * The term is in a field that was not tokenized during indexing and 
therefore, the entire content of the field was considered as a single term. 
Re-index the documents and make sure the field is tokenized. 
  * The field specified in the query simply does not exist. You won't get an 
error message in this case, you'll just get no matches.
  * The field specified in the query has wrong case. Field names are case 
sensitive.
- * The term you are searching is a stop word that was dropped by the analyzer 
you use. For example, if your analyzer uses the StopFilter, a search for the 
word 'the' will always fail (i.e. produce no hits).
+ * The term you are searching is a stop word that was dropped by the analyzer 
you use. For example, if your analyzer uses the !StopFilter, a search for the 
word 'the' will always fail (i.e. produce no hits).
  * You are using different analyzers (or the same analyzer but with different 
stop words) for indexing and searching and as a result, the same term is 
transformed differently during indexing and searching.
- * The analyzer you are using is case sensitive (e.g. it does not use the 
LowerCaseFilter) and the term in the query has different case than the term in 
the document. 
- * The documents you are indexing are very large. Lucene by default only 
indexes the first 10,000 terms of a document to avoid OutOfMemory errors. See 
[http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/index/IndexWriter.html#maxFieldLength
 IndexWriter.maxFieldLength].
+ * The analyzer you are using is case sensitive (e.g. it does not use the 
!LowerCaseFilter) and the term in the query has different case than the term in 
the document. 
+ * The documents you are indexing are very large. Lucene by default only 
indexes the first 10,000 terms of a document to avoid !OutOfMemory errors. See 
[http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/index/IndexWriter.html#maxFieldLength
 IndexWriter.maxFieldLength].
  
 If none of the possible causes above apply to your case, this will help you to 
debug the problem:
 
@@ -117,7 +117,7 @@
 
 Another wild card character that you can use is '?', a question mark.  The ? 
will match a single character.  This allows you to perform queries such as 
''Bra?il''. Such a query will match both ''Brasil'' and ''Brazil''.  Lucene 
refers to this type of a query as a 'wildcard query'.
 
-'''Note''': Leading wildcards (e.g. ''*ook'') are '''not''' supported by the 
QueryParser.
+'''Note''': Leading wildcards (e.g. ''*ook'') are '''not''' supported by the 
!QueryParser.
 
 
 ==== Is the QueryParser thread-safe? ====
@@ -171,7 +171,7 @@
 By default, `slop` is set to 0 so that only exact phrases will match.
 However, you can alter the value using the `setSlop(int)` method.
 
-When using QueryParser you can use this syntax to specify the slop: "doug 
cutting"~2 will find documents that contain "doug cutting" as well as ones that 
contain "cutting doug".
+When using !QueryParser you can use this syntax to specify the slop: "doug 
cutting"~2 will find documents that contain "doug cutting" as well as ones that 
contain "cutting doug".
 
 
 ==== Are Wildcard, Prefix, and Fuzzy queries case sensitive? ====
@@ -209,7 +209,7 @@
 
 ==== Is the IndexSearcher thread-safe? ====
 
-'''Yes''', IndexSearcher is thread-safe.  Multiple search threads may access 
the index concurrently without any problems.
+Yes, !IndexSearcher is thread-safe.  Multiple search threads may access the 
index concurrently without any problems.
 
 
 ==== Is there a way to retrieve the original term positions during the search? 
====
@@ -272,12 +272,12 @@
 
 ==== How do I perform a simple indexing of a set of documents? ====
 
-The easiest way is to re-index the entire document set periodically or 
whenever it changes. All you need to do is to create an instance of 
IndexWriter(), iterate over your document set, create for each document a 
Lucene Document object and add it to the IndexWriter. When you are done make 
sure to close the IndexWriter. This will release all of its resources and will 
close the files it created. 
+The easiest way is to re-index the entire document set periodically or 
whenever it changes. All you need to do is to create an instance of 
!IndexWriter(), iterate over your document set, create for each document a 
Lucene Document object and add it to the !IndexWriter. When you are done make 
sure to close the !IndexWriter. This will release all of its resources and will 
close the files it created. 
 
 
 ==== How can I add document(s) to the index? ====
 
-Simply create an IndexWriter and use its addDocument() method. Make sure to 
create the IndexWriter with the 'create' flag set to false and make sure to 
close the IndexWriter when you are done adding the documents.
+Simply create an !IndexWriter and use its addDocument() method. Make sure to 
create the !IndexWriter with the 'create' flag set to false and make sure to 
close the !IndexWriter when you are done adding the documents.
 
 
 ==== Where does Lucene store the index it builds? ====
@@ -345,7 +345,7 @@
 
 ==== What is index optimization and when should I use it? ====
 
-The IndexWriter class supports an optimize() method that compacts the index 
database and speedup queries. You may want to use this method after performing 
a complete indexing of your document set or after incremental updates of the 
index. If your incremental update adds documents frequently, you want to 
perform the optimization only once in a while to avoid the extra overhead of 
the optimization.
+The !IndexWriter class supports an optimize() method that compacts the index 
database and speedup queries. You may want to use this method after performing 
a complete indexing of your document set or after incremental updates of the 
index. If your incremental update adds documents frequently, you want to 
perform the optimization only once in a while to avoid the extra overhead of 
the optimization.
 
 ==== What are Segments? ====
 
@@ -384,16 +384,16 @@
 The write.lock is used to keep processes from concurrently attempting
 to modify an index. 
 
-It is obtained by an `IndexWriter` while it is open, and by an `IndexReader` 
once documents have been deleted and until it is closed.
+It is obtained by an !IndexWriter while it is open, and by an !IndexReader 
once documents have been deleted and until it is closed.
 
 
 ==== What is the purpose of the commit.lock file, when is it used, and by 
which classes? ====
 
 The commit.lock file is used to coordinate the contents of the 'segments'
-file with the files in the index.  It is obtained by an `IndexReader` before 
it reads the 'segments' file, which names all of the other files in the
-index, and until the `IndexReader` has opened all of these other files.
+file with the files in the index.  It is obtained by an !IndexReader before it 
reads the 'segments' file, which names all of the other files in the
+index, and until the !IndexReader has opened all of these other files.
 
-The commit.lock is also obtained by the `IndexWriter` when it is about to 
write the segments file and until it has finished trying to delete obsolete 
index files.
+The commit.lock is also obtained by the !IndexWriter when it is about to write 
the segments file and until it has finished trying to delete obsolete index 
files.
 
 The commit.lock should thus never be held for long, since while
 it is obtained files are only opened or deleted, and one small file is
@@ -484,7 +484,7 @@
 and content.xml to get the document's content. Add these to the Lucene index,
 typically using one Lucene field per property.
 
-Note that this applies to OpenOffice.org 1.x, things might change a bit for 
OpenOffice.org
+Note that this applies to !OpenOffice.org 1.x, things might change a bit for 
!OpenOffice.org
 2.x, but the basic approach will still be the same.
 
 
@@ -545,9 +545,9 @@
 
 ==== What is the difference between IndexWriter.addIndexes(IndexReader[]) and 
IndexWriter.addIndexes(Directory[]), besides them taking different arguments? 
====
 
-When merging lots of indexes (more than the mergeFactor), the Directory-based 
method will use fewer file handles and less memory, as it will only ever open 
mergeFactor indexes at once, while the IndexReader-based method requires that 
all indexes be open when passed.
+When merging lots of indexes (more than the mergeFactor), the Directory-based 
method will use fewer file handles and less memory, as it will only ever open 
mergeFactor indexes at once, while the !IndexReader-based method requires that 
all indexes be open when passed.
 
-The primary advantage of the IndexReader-based method is that one can pass it 
IndexReaders that don't reside in a Directory.
+The primary advantage of the !IndexReader-based method is that one can pass it 
!IndexReaders that don't reside in a Directory.
 
 
 ==== Can I use Lucene to index text in Chinese, Japanese, Korean, and other 
multi-byte character sets? ====

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

=?iso-8859-1?q?=5BJakarta_Lucene_Wiki=5D_Updated=3A__LuceneFAQ?=

Reply via email to