otis 2003/01/23 08:11:00 Modified: xdocs queryparsersyntax.xml docs queryparsersyntax.html Log: - Added useful info to Overview, added a section about Range Searches and a section about Field Grouping. Fixed a few small gramatical errors. Revision Changes Path 1.4 +55 -4 jakarta-lucene/xdocs/queryparsersyntax.xml Index: queryparsersyntax.xml =================================================================== RCS file: /home/cvs/jakarta-lucene/xdocs/queryparsersyntax.xml,v retrieving revision 1.3 retrieving revision 1.4 diff -u -r1.3 -r1.4 --- queryparsersyntax.xml 16 May 2002 14:04:14 -0000 1.3 +++ queryparsersyntax.xml 23 Jan 2003 16:10:59 -0000 1.4 @@ -8,9 +8,37 @@ </properties> <body> <section name="Overview"> - <p>Although Lucene provides the ability to create your own query's though its API, it also provides a rich query language through the QueryParser.</p> - <p>This page provides syntax of Lucene's Query Parser, a lexer which interprets a string into a Lucene Query using JavaCC.</p> + <p>Although Lucene provides the ability to create your own + queries through its API, it also provides a rich query + language through the Query Parser.</p> <p>This page + provides syntax of Lucene's Query Parser, a lexer which + interprets a string into a Lucene Query using JavaCC.</p> + <p> + Before choosing to use the provided Query Parser, please consider the following: + <ol> + <li>If you are programmatically generating a query string and then + parsing it with the query parser then you should seriously consider building + your queries directly with the query API. In other words, the query + parser is designed for human-entered text, not for program-generated + text.</li> + + <li>Untokenized fields are best added directly to queries, and not + through the query parser. If a field's values are generated programmatically + by the application, then so should query clauses for this field. + Analyzers, like the query parser, are designed to convert human-entered + text to terms. Program-generated values, like dates, keywords, etc., + should be consistently program-generated.</li> + + <li>In a query form, fields which are general text should use the query + parser. All others, such as date ranges, keywords, etc. are better added + directly through the query API. A field with a limit set of values, + that can be specified with a pull-down menu should not be added to a + query string which is subsequently parsed, but rather added as a + TermQuery clause.</li> + </ol> + </p> </section> + <section name="Terms"> <p>A query is broken up into terms and operators. There are two types of terms: Single Terms and Phrases.</p> <p>A Single Term is a single word such as "test" or "hello".</p> @@ -37,6 +65,7 @@ </section> <section name="Term Modifiers"> + <p>Lucene supports modifying query terms to provide a wide range of searching options.</p> <subsection name="Wildcard Searches"> @@ -62,14 +91,27 @@ <p>This search will find terms like foam and roams</p> <p>Note:Terms found by the fuzzy search will automatically get a boost factor of 0.2</p> </subsection> + <subsection name="Proximity Searches"> <p>Lucene supports finding words are a within a specific distance away. To do a proximity search use the tilde, "~", symbol at the end of a Phrase. For example to search for a "apache" and "jakarta" within 10 words of each other in a document use the search: </p> <source>"jakarta apache"~10</source> - </subsection> + + <subsection name="Range Searches"> + <p>Range Queries allow one to match documents whose field(s) values + are between the lower and upper bound specified by the Range Query. + Range Queries are inclusive (i.e. the query includes the specified lower and upper bound). + Sorting is done lexicographically.</p> + <source>mod_date:[20020101 TO 20030101]</source> + <p>This will find documents whose mod_date fields have values between 20020101 and 20030101. + Note that Range Queries are not reserved for date fields. You could also use range queries with non-date fields:</p> + <source>title:[Aida TO Carmen]</source> + <p>This will find all documents whose titles are between Aida and Carmen.</p> + </subsection> + <subsection name="Boosting a Term"> <p>Lucene provides the relevance level of matching documents based on the terms found. To boost a term use the caret, "^", symbol with a boost factor (a number) at the end of the term you are searching. The higher the boost factor, the more relevant the term will be.</p> @@ -82,11 +124,14 @@ <p>This will make documents with the term jakarta appear more relevant. You can also boost Phrase Terms as in the example: </p> <source>"jakarta apache"^4 "jakarta lucene"</source> - <p>By default, the boost factor is 1. Although, the boost factor must be positive, it can be less than 1 (i.e. .2)</p> + <p>By default, the boost factor is 1. Although the boost factor must be positive, it can be less than 1 (e.g. 0.2)</p> </subsection> + </section> + <section name="Boolean operators"> + <p>Boolean operators allow terms to be combined through logic operators. Lucene supports AND, "+", OR, NOT and "-" as Boolean operators(Note: Boolean operators must be ALL CAPS).</p> @@ -145,6 +190,12 @@ <p>This eliminates any confusion and makes sure you that website must exist and either term jakarta or apache may exist.</p> </section> + <section name="Field Grouping"> + <p>Lucene supports using parentheses to group multiple clauses to a single field.</p> + <p>To search for a title that contains both the word "return" and the phrase "pink panther" use the query:</p> + <source>title:(+return +"pink panther")</source> + </section> + <section name="Escaping Special Characters"> <p>Lucene supports escaping special characters that are part of the query syntax. The current list special characters are</p> <p>+ - && || ! ( ) { } [ ] ^ " ~ * ? : \</p> 1.15 +122 -3 jakarta-lucene/docs/queryparsersyntax.html Index: queryparsersyntax.html =================================================================== RCS file: /home/cvs/jakarta-lucene/docs/queryparsersyntax.html,v retrieving revision 1.14 retrieving revision 1.15 diff -u -r1.14 -r1.15 --- queryparsersyntax.html 4 Jan 2003 17:19:16 -0000 1.14 +++ queryparsersyntax.html 23 Jan 2003 16:10:59 -0000 1.15 @@ -119,8 +119,36 @@ </td></tr> <tr><td> <blockquote> - <p>Although Lucene provides the ability to create your own query's though its API, it also provides a rich query language through the QueryParser.</p> - <p>This page provides syntax of Lucene's Query Parser, a lexer which interprets a string into a Lucene Query using JavaCC.</p> + <p>Although Lucene provides the ability to create your own + queries through its API, it also provides a rich query + language through the Query Parser.</p> + <p>This page + provides syntax of Lucene's Query Parser, a lexer which + interprets a string into a Lucene Query using JavaCC.</p> + <p> + Before choosing to use the provided Query Parser, please consider the following: + <ol> + <li>If you are programmatically generating a query string and then + parsing it with the query parser then you should seriously consider building + your queries directly with the query API. In other words, the query + parser is designed for human-entered text, not for program-generated + text.</li> + + <li>Untokenized fields are best added directly to queries, and not + through the query parser. If a field's values are generated programmatically + by the application, then so should query clauses for this field. + Analyzers, like the query parser, are designed to convert human-entered + text to terms. Program-generated values, like dates, keywords, etc., + should be consistently program-generated.</li> + + <li>In a query form, fields which are general text should use the query + parser. All others, such as date ranges, keywords, etc. are better added + directly through the query API. A field with a limit set of values, + that can be specified with a pull-down menu should not be added to a + query string which is subsequently parsed, but rather added as a + TermQuery clause.</li> + </ol> + </p> </blockquote> </p> </td></tr> @@ -377,6 +405,63 @@ <table border="0" cellspacing="0" cellpadding="2" width="100%"> <tr><td bgcolor="#828DA6"> <font color="#ffffff" face="arial,helvetica,sanserif"> + <a name="Range Searches"><strong>Range Searches</strong></a> + </font> + </td></tr> + <tr><td> + <blockquote> + <p>Range Queries allow one to match documents whose field(s) values + are between the lower and upper bound specified by the Range Query. + Range Queries are inclusive (i.e. the query includes the specified lower and upper bound). + Sorting is done lexicographically.</p> + <div align="left"> + <table cellspacing="4" cellpadding="0" border="0"> + <tr> + <td bgcolor="#023264" width="1" height="1"><img src="/images/void.gif" width="1" height="1" vspace="0" hspace="0" border="0"/></td> + <td bgcolor="#023264" height="1"><img src="/images/void.gif" width="1" height="1" vspace="0" hspace="0" border="0"/></td> + <td bgcolor="#023264" width="1" height="1"><img src="/images/void.gif" width="1" height="1" vspace="0" hspace="0" border="0"/></td> + </tr> + <tr> + <td bgcolor="#023264" width="1"><img src="/images/void.gif" width="1" height="1" vspace="0" hspace="0" border="0"/></td> + <td bgcolor="#ffffff"><pre>mod_date:[20020101 TO 20030101]</pre></td> + <td bgcolor="#023264" width="1"><img src="/images/void.gif" width="1" height="1" vspace="0" hspace="0" border="0"/></td> + </tr> + <tr> + <td bgcolor="#023264" width="1" height="1"><img src="/images/void.gif" width="1" height="1" vspace="0" hspace="0" border="0"/></td> + <td bgcolor="#023264" height="1"><img src="/images/void.gif" width="1" height="1" vspace="0" hspace="0" border="0"/></td> + <td bgcolor="#023264" width="1" height="1"><img src="/images/void.gif" width="1" height="1" vspace="0" hspace="0" border="0"/></td> + </tr> + </table> + </div> + <p>This will find documents whose mod_date fields have values between 20020101 and 20030101. + Note that Range Queries are not reserved for date fields. You could also use range queries with non-date fields:</p> + <div align="left"> + <table cellspacing="4" cellpadding="0" border="0"> + <tr> + <td bgcolor="#023264" width="1" height="1"><img src="/images/void.gif" width="1" height="1" vspace="0" hspace="0" border="0"/></td> + <td bgcolor="#023264" height="1"><img src="/images/void.gif" width="1" height="1" vspace="0" hspace="0" border="0"/></td> + <td bgcolor="#023264" width="1" height="1"><img src="/images/void.gif" width="1" height="1" vspace="0" hspace="0" border="0"/></td> + </tr> + <tr> + <td bgcolor="#023264" width="1"><img src="/images/void.gif" width="1" height="1" vspace="0" hspace="0" border="0"/></td> + <td bgcolor="#ffffff"><pre>title:[Aida TO Carmen]</pre></td> + <td bgcolor="#023264" width="1"><img src="/images/void.gif" width="1" height="1" vspace="0" hspace="0" border="0"/></td> + </tr> + <tr> + <td bgcolor="#023264" width="1" height="1"><img src="/images/void.gif" width="1" height="1" vspace="0" hspace="0" border="0"/></td> + <td bgcolor="#023264" height="1"><img src="/images/void.gif" width="1" height="1" vspace="0" hspace="0" border="0"/></td> + <td bgcolor="#023264" width="1" height="1"><img src="/images/void.gif" width="1" height="1" vspace="0" hspace="0" border="0"/></td> + </tr> + </table> + </div> + <p>This will find all documents whose titles are between Aida and Carmen.</p> + </blockquote> + </td></tr> + <tr><td><br/></td></tr> + </table> + <table border="0" cellspacing="0" cellpadding="2" width="100%"> + <tr><td bgcolor="#828DA6"> + <font color="#ffffff" face="arial,helvetica,sanserif"> <a name="Boosting a Term"><strong>Boosting a Term</strong></a> </font> </td></tr> @@ -444,7 +529,7 @@ </tr> </table> </div> - <p>By default, the boost factor is 1. Although, the boost factor must be positive, it can be less than 1 (i.e. .2)</p> + <p>By default, the boost factor is 1. Although the boost factor must be positive, it can be less than 1 (e.g. 0.2)</p> </blockquote> </td></tr> <tr><td><br/></td></tr> @@ -708,6 +793,40 @@ </table> </div> <p>This eliminates any confusion and makes sure you that website must exist and either term jakarta or apache may exist.</p> + </blockquote> + </p> + </td></tr> + <tr><td><br/></td></tr> + </table> + <table border="0" cellspacing="0" cellpadding="2" width="100%"> + <tr><td bgcolor="#525D76"> + <font color="#ffffff" face="arial,helvetica,sanserif"> + <a name="Field Grouping"><strong>Field Grouping</strong></a> + </font> + </td></tr> + <tr><td> + <blockquote> + <p>Lucene supports using parentheses to group multiple clauses to a single field.</p> + <p>To search for a title that contains both the word "return" and the phrase "pink panther" use the query:</p> + <div align="left"> + <table cellspacing="4" cellpadding="0" border="0"> + <tr> + <td bgcolor="#023264" width="1" height="1"><img src="/images/void.gif" width="1" height="1" vspace="0" hspace="0" border="0"/></td> + <td bgcolor="#023264" height="1"><img src="/images/void.gif" width="1" height="1" vspace="0" hspace="0" border="0"/></td> + <td bgcolor="#023264" width="1" height="1"><img src="/images/void.gif" width="1" height="1" vspace="0" hspace="0" border="0"/></td> + </tr> + <tr> + <td bgcolor="#023264" width="1"><img src="/images/void.gif" width="1" height="1" vspace="0" hspace="0" border="0"/></td> + <td bgcolor="#ffffff"><pre>title:(+return +"pink panther")</pre></td> + <td bgcolor="#023264" width="1"><img src="/images/void.gif" width="1" height="1" vspace="0" hspace="0" border="0"/></td> + </tr> + <tr> + <td bgcolor="#023264" width="1" height="1"><img src="/images/void.gif" width="1" height="1" vspace="0" hspace="0" border="0"/></td> + <td bgcolor="#023264" height="1"><img src="/images/void.gif" width="1" height="1" vspace="0" hspace="0" border="0"/></td> + <td bgcolor="#023264" width="1" height="1"><img src="/images/void.gif" width="1" height="1" vspace="0" hspace="0" border="0"/></td> + </tr> + </table> + </div> </blockquote> </p> </td></tr>
-- To unsubscribe, e-mail: <mailto:[EMAIL PROTECTED]> For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>