Added: websites/staging/lucy/trunk/content/docs/perl/Clownfish/Vector.html ============================================================================== --- websites/staging/lucy/trunk/content/docs/perl/Clownfish/Vector.html (added) +++ websites/staging/lucy/trunk/content/docs/perl/Clownfish/Vector.html Mon Apr 4 09:23:00 2016 @@ -0,0 +1,295 @@ +<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> +<html lang="en"> + <head> + <meta http-equiv="Content-Type" content="text/html;charset=UTF-8"> + <title>Clownfish::Vector â Apache Clownfish Documentation</title> + <link rel="stylesheet" type="text/css" media="screen" href="/css/lucy.css"> + </head> + + <body> + + <div id="lucy-rigid_wrapper"> + + <div id="lucy-top" class="container_16 lucy-white_box_3d"> + + <div id="lucy-logo_box" class="grid_8"> + <a href="/"><img src="/images/lucy_logo_150x100.png" alt="Apache Lucyâ¢"></a> + </div> <!-- lucy-logo_box --> + + <div #id="lucy-top_nav_box" class="grid_8"> + <div id="lucy-top_nav_bar" class="container_8"> + <ul> + <li><a href="http://www.apache.org/" title="Apache Software Foundation">Apache Software Foundation</a></li> + <li><a href="http://www.apache.org/licenses/" title="License">License</a></li> + <li><a href="http://www.apache.org/foundation/sponsorship.html" title="Sponsorship">Sponsorship</a></li> + <li><a href="http://www.apache.org/foundation/thanks.html" title="Thanks">Thanks</a></li> + <li><a href="http://www.apache.org/security/ " title="Security">Security</a></li> + </ul> + </div> <!-- lucy-top_nav_bar --> + <p><a href="http://www.apache.org/">Apache</a> » <a href="/">Lucy</a> » <a href="/docs/">Docs</a> » <a href="/docs/perl/">Perl</a> » <a href="/docs/perl/Clownfish/">Clownfish</a></p> + <form name="lucy-top_search_box" id="lucy-top_search_box" action="http://www.google.com/search" method="get"> + <input value="*.apache.org" name="sitesearch" type="hidden"/> + <input type="text" name="q" id="query" style="width:85%"> + <input type="submit" id="submit" value="Search"> + </form> + </div> <!-- lucy-top_nav_box --> + + <div class="clear"></div> + + </div> <!-- lucy-top --> + + <div id="lucy-main_content" class="container_16 lucy-white_box_3d"> + + <div class="grid_4" id="lucy-left_nav_box"> + <h6>About</h6> + <ul> + <li><a href="/">Welcome</a></li> + <li><a href="/clownfish.html">Clownfish</a></li> + <li><a href="/faq.html">FAQ</a></li> + <li><a href="/people.html">People</a></li> + </ul> + <h6>Resources</h6> + <ul> + <li><a href="/download.html">Download</a></li> + <li><a href="/mailing_lists.html">Mailing Lists</a></li> + <li><a href="/docs/perl/">Documentation</a></li> + <li><a href="http://wiki.apache.org/lucy/">Wiki</a></li> + <li><a href="https://issues.apache.org/jira/browse/LUCY">Issue Tracker</a></li> + <li><a href="/version_control.html">Version Control</a></li> + </ul> + <h6>Related Projects</h6> + <ul> + <li><a href="http://lucene.apache.org/core/">Lucene</a></li> + <li><a href="http://dezi.org/">Dezi</a></li> + <li><a href="http://lucene.apache.org/solr/">Solr</a></li> + <li><a href="http://lucenenet.apache.org/">Lucene.NET</a></li> + <li><a href="http://lucene.apache.org/pylucene/">PyLucene</a></li> + </ul> + </div> <!-- lucy-left_nav_box --> + + <div id="lucy-main_content_box" class="grid_9"> + <div> +<a name='___top' class='dummyTopAnchor' ></a> + +<h2><a class='u' +name="NAME" +>NAME</a></h2> + +<p>Clownfish::Vector - Variable-sized array.</p> + +<h2><a class='u' +name="SYNOPSIS" +>SYNOPSIS</a></h2> + +<pre>my $vector = Clownfish::Vector->new; +$vector->store($tick, $value); +my $value = $vector->fetch($tick);</pre> + +<h2><a class='u' +name="DESCRIPTION" +>DESCRIPTION</a></h2> + +<h2><a class='u' +name="CONSTRUCTORS" +>CONSTRUCTORS</a></h2> + +<h3><a class='u' +name="new" +>new</a></h3> + +<pre>my $vector = Clownfish::Vector->new( + capacity => $capacity # default: 0 +);</pre> + +<p>Return a new Vector.</p> + +<ul> +<li><b>capacity</b> - Initial number of elements that the object will be able to hold before reallocation.</li> +</ul> + +<h2><a class='u' +name="METHODS" +>METHODS</a></h2> + +<h3><a class='u' +name="push" +>push</a></h3> + +<pre>$vector->push($element); +$vector->push(); # default: undef</pre> + +<p>Push an item onto the end of a Vector.</p> + +<h3><a class='u' +name="push_all" +>push_all</a></h3> + +<pre>$vector->push_all($other);</pre> + +<p>Push all the elements of another Vector onto the end of this one.</p> + +<h3><a class='u' +name="pop" +>pop</a></h3> + +<pre>my $obj = $vector->pop();</pre> + +<p>Pop an item off of the end of a Vector.</p> + +<p>Returns: the element or undef if the Vector is empty.</p> + +<h3><a class='u' +name="insert" +>insert</a></h3> + +<pre>$vector->insert( + tick => $tick # required + element => $element # default: undef +);</pre> + +<p>Insert an element at <code>tick</code> moving the following elements.</p> + +<h3><a class='u' +name="insert_all" +>insert_all</a></h3> + +<pre>$vector->insert_all( + tick => $tick # required + other => $other # required +);</pre> + +<p>Inserts elements from <code>other</code> vector at <code>tick</code> moving the following elements.</p> + +<h3><a class='u' +name="fetch" +>fetch</a></h3> + +<pre>my $obj = $vector->fetch($tick);</pre> + +<p>Fetch the element at <code>tick</code>.</p> + +<p>Returns: the element or undef if <code>tick</code> is out of bounds.</p> + +<h3><a class='u' +name="store" +>store</a></h3> + +<pre>$vector->store($tick, $elem)</pre> + +<p>Store an element at index <code>tick</code>, +possibly displacing an existing element.</p> + +<h3><a class='u' +name="delete" +>delete</a></h3> + +<pre>my $obj = $vector->delete($tick);</pre> + +<p>Replace an element in the Vector with undef and return it.</p> + +<p>Returns: the element stored at <code>tick</code> or undef if <code>tick</code> is out of bounds.</p> + +<h3><a class='u' +name="excise" +>excise</a></h3> + +<pre>$vector->excise( + offset => $offset # required + length => $length # required +);</pre> + +<p>Remove <code>length</code> elements from the Vector, +starting at <code>offset</code>. +Move elements over to fill in the gap.</p> + +<h3><a class='u' +name="clone" +>clone</a></h3> + +<pre>my $arrayref = $vector->clone();</pre> + +<p>Clone the Vector but merely increment the refcounts of its elements rather than clone them.</p> + +<h3><a class='u' +name="sort" +>sort</a></h3> + +<pre>$vector->sort();</pre> + +<p>Sort the Vector. +Sort order is guaranteed to be <i>stable</i>: the relative order of elements which compare as equal will not change.</p> + +<h3><a class='u' +name="resize" +>resize</a></h3> + +<pre>$vector->resize($size);</pre> + +<p>Set the size for the Vector. +If the new size is larger than the current size, +grow the object to accommodate undef elements; if smaller than the current size, +decrement and discard truncated elements.</p> + +<h3><a class='u' +name="clear" +>clear</a></h3> + +<pre>$vector->clear();</pre> + +<p>Empty the Vector.</p> + +<h3><a class='u' +name="get_size" +>get_size</a></h3> + +<pre>my $int = $vector->get_size();</pre> + +<p>Return the size of the Vector.</p> + +<h3><a class='u' +name="slice" +>slice</a></h3> + +<pre>my $arrayref = $vector->slice( + offset => $offset # required + length => $length # required +);</pre> + +<p>Return a slice of the Vector consisting of elements from a contiguous range. +If the specified range is out of bounds, +return a slice with fewer elements – potentially none.</p> + +<ul> +<li><b>offset</b> - The index of the element to start at.</li> + +<li><b>length</b> - The maximum number of elements to slice.</li> +</ul> + +<h2><a class='u' +name="INHERITANCE" +>INHERITANCE</a></h2> + +<p>Clownfish::Vector isa <a href="../Clownfish/Obj.html" class="podlinkpod" +>Clownfish::Obj</a>.</p> + +</div> + + </div> <!-- lucy-main_content_box --> + <div class="clear"></div> + + </div> <!-- lucy-main_content --> + + <div id="lucy-copyright" class="container_16"> + <p>Copyright © 2010-2015 The Apache Software Foundation, Licensed under the + <a href="http://www.apache.org/licenses/LICENSE-2.0">Apache License, Version 2.0</a>. + <br/> + Apache Lucy, Lucy, Apache, the Apache feather logo, and the Apache Lucy project logo are trademarks of The + Apache Software Foundation. All other marks mentioned may be trademarks or registered trademarks of their + respective owners. + </p> + </div> <!-- lucy-copyright --> + + </div> <!-- lucy-rigid_wrapper --> + + </body> +</html>
Added: websites/staging/lucy/trunk/content/docs/perl/Lucy/Analysis/Inversion.html ============================================================================== --- websites/staging/lucy/trunk/content/docs/perl/Lucy/Analysis/Inversion.html (added) +++ websites/staging/lucy/trunk/content/docs/perl/Lucy/Analysis/Inversion.html Mon Apr 4 09:23:00 2016 @@ -0,0 +1,175 @@ +<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> +<html lang="en"> + <head> + <meta http-equiv="Content-Type" content="text/html;charset=UTF-8"> + <title>Lucy::Analysis::Inversion â Apache Lucy Documentation</title> + <link rel="stylesheet" type="text/css" media="screen" href="/css/lucy.css"> + </head> + + <body> + + <div id="lucy-rigid_wrapper"> + + <div id="lucy-top" class="container_16 lucy-white_box_3d"> + + <div id="lucy-logo_box" class="grid_8"> + <a href="/"><img src="/images/lucy_logo_150x100.png" alt="Apache Lucyâ¢"></a> + </div> <!-- lucy-logo_box --> + + <div #id="lucy-top_nav_box" class="grid_8"> + <div id="lucy-top_nav_bar" class="container_8"> + <ul> + <li><a href="http://www.apache.org/" title="Apache Software Foundation">Apache Software Foundation</a></li> + <li><a href="http://www.apache.org/licenses/" title="License">License</a></li> + <li><a href="http://www.apache.org/foundation/sponsorship.html" title="Sponsorship">Sponsorship</a></li> + <li><a href="http://www.apache.org/foundation/thanks.html" title="Thanks">Thanks</a></li> + <li><a href="http://www.apache.org/security/ " title="Security">Security</a></li> + </ul> + </div> <!-- lucy-top_nav_bar --> + <p><a href="http://www.apache.org/">Apache</a> » <a href="/">Lucy</a> » <a href="/docs/">Docs</a> » <a href="/docs/perl/">Perl</a> » <a href="/docs/perl/Lucy/">Lucy</a> » <a href="/docs/perl/Lucy/Analysis/">Analysis</a></p> + <form name="lucy-top_search_box" id="lucy-top_search_box" action="http://www.google.com/search" method="get"> + <input value="*.apache.org" name="sitesearch" type="hidden"/> + <input type="text" name="q" id="query" style="width:85%"> + <input type="submit" id="submit" value="Search"> + </form> + </div> <!-- lucy-top_nav_box --> + + <div class="clear"></div> + + </div> <!-- lucy-top --> + + <div id="lucy-main_content" class="container_16 lucy-white_box_3d"> + + <div class="grid_4" id="lucy-left_nav_box"> + <h6>About</h6> + <ul> + <li><a href="/">Welcome</a></li> + <li><a href="/clownfish.html">Clownfish</a></li> + <li><a href="/faq.html">FAQ</a></li> + <li><a href="/people.html">People</a></li> + </ul> + <h6>Resources</h6> + <ul> + <li><a href="/download.html">Download</a></li> + <li><a href="/mailing_lists.html">Mailing Lists</a></li> + <li><a href="/docs/perl/">Documentation</a></li> + <li><a href="http://wiki.apache.org/lucy/">Wiki</a></li> + <li><a href="https://issues.apache.org/jira/browse/LUCY">Issue Tracker</a></li> + <li><a href="/version_control.html">Version Control</a></li> + </ul> + <h6>Related Projects</h6> + <ul> + <li><a href="http://lucene.apache.org/core/">Lucene</a></li> + <li><a href="http://dezi.org/">Dezi</a></li> + <li><a href="http://lucene.apache.org/solr/">Solr</a></li> + <li><a href="http://lucenenet.apache.org/">Lucene.NET</a></li> + <li><a href="http://lucene.apache.org/pylucene/">PyLucene</a></li> + </ul> + </div> <!-- lucy-left_nav_box --> + + <div id="lucy-main_content_box" class="grid_9"> + <div> +<a name='___top' class='dummyTopAnchor' ></a> + +<h2><a class='u' +name="NAME" +>NAME</a></h2> + +<p>Lucy::Analysis::Inversion - A collection of Tokens.</p> + +<h2><a class='u' +name="SYNOPSIS" +>SYNOPSIS</a></h2> + +<pre>my $result = Lucy::Analysis::Inversion->new; + +while (my $token = $inversion->next) { + $result->append($token); +}</pre> + +<h2><a class='u' +name="DESCRIPTION" +>DESCRIPTION</a></h2> + +<p>An Inversion is a collection of Token objects which you can add to, +then iterate over.</p> + +<h2><a class='u' +name="CONSTRUCTORS" +>CONSTRUCTORS</a></h2> + +<h3><a class='u' +name="new" +>new</a></h3> + +<pre>my $inversion = Lucy::Analysis::Inversion->new( + $seed, # optional +);</pre> + +<p>Create a new Inversion.</p> + +<ul> +<li><b>seed</b> - An initial Token to start things off, +which may be undef.</li> +</ul> + +<h2><a class='u' +name="METHODS" +>METHODS</a></h2> + +<h3><a class='u' +name="append" +>append</a></h3> + +<pre>$inversion->append($token);</pre> + +<p>Tack a token onto the end of the Inversion.</p> + +<ul> +<li><b>token</b> - A Token.</li> +</ul> + +<h3><a class='u' +name="next" +>next</a></h3> + +<pre>my $token = $inversion->next();</pre> + +<p>Return the next token in the Inversion until out of tokens.</p> + +<h3><a class='u' +name="reset" +>reset</a></h3> + +<pre>$inversion->reset();</pre> + +<p>Reset the Inversion’s iterator, +so that the next call to next() returns the first Token in the inversion.</p> + +<h2><a class='u' +name="INHERITANCE" +>INHERITANCE</a></h2> + +<p>Lucy::Analysis::Inversion isa Clownfish::Obj.</p> + +</div> + + </div> <!-- lucy-main_content_box --> + <div class="clear"></div> + + </div> <!-- lucy-main_content --> + + <div id="lucy-copyright" class="container_16"> + <p>Copyright © 2010-2015 The Apache Software Foundation, Licensed under the + <a href="http://www.apache.org/licenses/LICENSE-2.0">Apache License, Version 2.0</a>. + <br/> + Apache Lucy, Lucy, Apache, the Apache feather logo, and the Apache Lucy project logo are trademarks of The + Apache Software Foundation. All other marks mentioned may be trademarks or registered trademarks of their + respective owners. + </p> + </div> <!-- lucy-copyright --> + + </div> <!-- lucy-rigid_wrapper --> + + </body> +</html> Added: websites/staging/lucy/trunk/content/docs/perl/Lucy/Analysis/Token.html ============================================================================== --- websites/staging/lucy/trunk/content/docs/perl/Lucy/Analysis/Token.html (added) +++ websites/staging/lucy/trunk/content/docs/perl/Lucy/Analysis/Token.html Mon Apr 4 09:23:00 2016 @@ -0,0 +1,242 @@ +<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> +<html lang="en"> + <head> + <meta http-equiv="Content-Type" content="text/html;charset=UTF-8"> + <title>Lucy::Analysis::Token â Apache Lucy Documentation</title> + <link rel="stylesheet" type="text/css" media="screen" href="/css/lucy.css"> + </head> + + <body> + + <div id="lucy-rigid_wrapper"> + + <div id="lucy-top" class="container_16 lucy-white_box_3d"> + + <div id="lucy-logo_box" class="grid_8"> + <a href="/"><img src="/images/lucy_logo_150x100.png" alt="Apache Lucyâ¢"></a> + </div> <!-- lucy-logo_box --> + + <div #id="lucy-top_nav_box" class="grid_8"> + <div id="lucy-top_nav_bar" class="container_8"> + <ul> + <li><a href="http://www.apache.org/" title="Apache Software Foundation">Apache Software Foundation</a></li> + <li><a href="http://www.apache.org/licenses/" title="License">License</a></li> + <li><a href="http://www.apache.org/foundation/sponsorship.html" title="Sponsorship">Sponsorship</a></li> + <li><a href="http://www.apache.org/foundation/thanks.html" title="Thanks">Thanks</a></li> + <li><a href="http://www.apache.org/security/ " title="Security">Security</a></li> + </ul> + </div> <!-- lucy-top_nav_bar --> + <p><a href="http://www.apache.org/">Apache</a> » <a href="/">Lucy</a> » <a href="/docs/">Docs</a> » <a href="/docs/perl/">Perl</a> » <a href="/docs/perl/Lucy/">Lucy</a> » <a href="/docs/perl/Lucy/Analysis/">Analysis</a></p> + <form name="lucy-top_search_box" id="lucy-top_search_box" action="http://www.google.com/search" method="get"> + <input value="*.apache.org" name="sitesearch" type="hidden"/> + <input type="text" name="q" id="query" style="width:85%"> + <input type="submit" id="submit" value="Search"> + </form> + </div> <!-- lucy-top_nav_box --> + + <div class="clear"></div> + + </div> <!-- lucy-top --> + + <div id="lucy-main_content" class="container_16 lucy-white_box_3d"> + + <div class="grid_4" id="lucy-left_nav_box"> + <h6>About</h6> + <ul> + <li><a href="/">Welcome</a></li> + <li><a href="/clownfish.html">Clownfish</a></li> + <li><a href="/faq.html">FAQ</a></li> + <li><a href="/people.html">People</a></li> + </ul> + <h6>Resources</h6> + <ul> + <li><a href="/download.html">Download</a></li> + <li><a href="/mailing_lists.html">Mailing Lists</a></li> + <li><a href="/docs/perl/">Documentation</a></li> + <li><a href="http://wiki.apache.org/lucy/">Wiki</a></li> + <li><a href="https://issues.apache.org/jira/browse/LUCY">Issue Tracker</a></li> + <li><a href="/version_control.html">Version Control</a></li> + </ul> + <h6>Related Projects</h6> + <ul> + <li><a href="http://lucene.apache.org/core/">Lucene</a></li> + <li><a href="http://dezi.org/">Dezi</a></li> + <li><a href="http://lucene.apache.org/solr/">Solr</a></li> + <li><a href="http://lucenenet.apache.org/">Lucene.NET</a></li> + <li><a href="http://lucene.apache.org/pylucene/">PyLucene</a></li> + </ul> + </div> <!-- lucy-left_nav_box --> + + <div id="lucy-main_content_box" class="grid_9"> + <div> +<a name='___top' class='dummyTopAnchor' ></a> + +<h2><a class='u' +name="NAME" +>NAME</a></h2> + +<p>Lucy::Analysis::Token - Unit of text.</p> + +<h2><a class='u' +name="SYNOPSIS" +>SYNOPSIS</a></h2> + +<pre> my $token = Lucy::Analysis::Token->new( + text => 'blind', + start_offset => 8, + end_offset => 13, + ); + + $token->set_text('mice');</pre> + +<h2><a class='u' +name="DESCRIPTION" +>DESCRIPTION</a></h2> + +<p>Token is the fundamental unit used by Apache Lucy’s Analyzer subclasses. +Each Token has 5 attributes: <code>text</code>, +<code>start_offset</code>, +<code>end_offset</code>, +<code>boost</code>, +and <code>pos_inc</code>.</p> + +<p>The <code>text</code> attribute is a Unicode string encoded as UTF-8.</p> + +<p><code>start_offset</code> is the start point of the token text, +measured in Unicode code points from the top of the stored field; <code>end_offset</code> delimits the corresponding closing boundary. +<code>start_offset</code> and <code>end_offset</code> locate the Token within a larger context, +even if the Token’s text attribute gets modified – by stemming, +for instance. +The Token for “beating” in the text “beating a dead horse” begins life with a start_offset of 0 and an end_offset of 7; after stemming, +the text is “beat”, +but the start_offset is still 0 and the end_offset is still 7. +This allows “beating” to be highlighted correctly after a search matches “beat”.</p> + +<p><code>boost</code> is a per-token weight. +Use this when you want to assign more or less importance to a particular token, +as you might for emboldened text within an HTML document, +for example. +(Note: The field this token belongs to must be spec’d to use a posting of type RichPosting.)</p> + +<p><code>pos_inc</code> is the POSition INCrement, +measured in Tokens. +This attribute, +which defaults to 1, +is a an advanced tool for manipulating phrase matching. +Ordinarily, +Tokens are assigned consecutive position numbers: 0, +1, +and 2 for <code>"three blind mice"</code>. +However, +if you set the position increment for “blind” to, +say, +1000, +then the three tokens will end up assigned to positions 0, +1, +and 1001 – and will no longer produce a phrase match for the query <code>"three blind mice"</code>.</p> + +<h2><a class='u' +name="CONSTRUCTORS" +>CONSTRUCTORS</a></h2> + +<h3><a class='u' +name="new" +>new</a></h3> + +<pre>my $token = Lucy::Analysis::Token->new( + text => $text, # required + start_offset => $start_offset, # required + end_offset => $end_offset, # required + boost => 1.0, # optional + pos_inc => 1, # optional +);</pre> + +<ul> +<li><b>text</b> - A string.</li> + +<li><b>start_offset</b> - Start offset into the original document in Unicode code points.</li> + +<li><b>start_offset</b> - End offset into the original document in Unicode code points.</li> + +<li><b>boost</b> - Per-token weight.</li> + +<li><b>pos_inc</b> - Position increment for phrase matching.</li> +</ul> + +<h2><a class='u' +name="METHODS" +>METHODS</a></h2> + +<h3><a class='u' +name="get_text" +>get_text</a></h3> + +<pre>my $text = $token->get_text;</pre> + +<p>Get the token's text.</p> + +<h3><a class='u' +name="set_text" +>set_text</a></h3> + +<pre>$token->set_text($text);</pre> + +<p>Set the token's text.</p> + +<h3><a class='u' +name="get_start_offset" +>get_start_offset</a></h3> + +<pre>my $int = $token->get_start_offset();</pre> + +<h3><a class='u' +name="get_end_offset" +>get_end_offset</a></h3> + +<pre>my $int = $token->get_end_offset();</pre> + +<h3><a class='u' +name="get_boost" +>get_boost</a></h3> + +<pre>my $float = $token->get_boost();</pre> + +<h3><a class='u' +name="get_pos_inc" +>get_pos_inc</a></h3> + +<pre>my $int = $token->get_pos_inc();</pre> + +<h3><a class='u' +name="get_len" +>get_len</a></h3> + +<pre>my $int = $token->get_len();</pre> + +<h2><a class='u' +name="INHERITANCE" +>INHERITANCE</a></h2> + +<p>Lucy::Analysis::Token isa Clownfish::Obj.</p> + +</div> + + </div> <!-- lucy-main_content_box --> + <div class="clear"></div> + + </div> <!-- lucy-main_content --> + + <div id="lucy-copyright" class="container_16"> + <p>Copyright © 2010-2015 The Apache Software Foundation, Licensed under the + <a href="http://www.apache.org/licenses/LICENSE-2.0">Apache License, Version 2.0</a>. + <br/> + Apache Lucy, Lucy, Apache, the Apache feather logo, and the Apache Lucy project logo are trademarks of The + Apache Software Foundation. All other marks mentioned may be trademarks or registered trademarks of their + respective owners. + </p> + </div> <!-- lucy-copyright --> + + </div> <!-- lucy-rigid_wrapper --> + + </body> +</html> Added: websites/staging/lucy/trunk/content/docs/perl/Lucy/Docs/Tutorial/AnalysisTutorial.html ============================================================================== --- websites/staging/lucy/trunk/content/docs/perl/Lucy/Docs/Tutorial/AnalysisTutorial.html (added) +++ websites/staging/lucy/trunk/content/docs/perl/Lucy/Docs/Tutorial/AnalysisTutorial.html Mon Apr 4 09:23:00 2016 @@ -0,0 +1,195 @@ +<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> +<html lang="en"> + <head> + <meta http-equiv="Content-Type" content="text/html;charset=UTF-8"> + <title>Lucy::Docs::Tutorial::AnalysisTutorial â Apache Lucy Documentation</title> + <link rel="stylesheet" type="text/css" media="screen" href="/css/lucy.css"> + </head> + + <body> + + <div id="lucy-rigid_wrapper"> + + <div id="lucy-top" class="container_16 lucy-white_box_3d"> + + <div id="lucy-logo_box" class="grid_8"> + <a href="/"><img src="/images/lucy_logo_150x100.png" alt="Apache Lucyâ¢"></a> + </div> <!-- lucy-logo_box --> + + <div #id="lucy-top_nav_box" class="grid_8"> + <div id="lucy-top_nav_bar" class="container_8"> + <ul> + <li><a href="http://www.apache.org/" title="Apache Software Foundation">Apache Software Foundation</a></li> + <li><a href="http://www.apache.org/licenses/" title="License">License</a></li> + <li><a href="http://www.apache.org/foundation/sponsorship.html" title="Sponsorship">Sponsorship</a></li> + <li><a href="http://www.apache.org/foundation/thanks.html" title="Thanks">Thanks</a></li> + <li><a href="http://www.apache.org/security/ " title="Security">Security</a></li> + </ul> + </div> <!-- lucy-top_nav_bar --> + <p><a href="http://www.apache.org/">Apache</a> » <a href="/">Lucy</a> » <a href="/docs/">Docs</a> » <a href="/docs/perl/">Perl</a> » <a href="/docs/perl/Lucy/">Lucy</a> » <a href="/docs/perl/Lucy/Docs/">Docs</a> » <a href="/docs/perl/Lucy/Docs/Tutorial/">Tutorial</a></p> + <form name="lucy-top_search_box" id="lucy-top_search_box" action="http://www.google.com/search" method="get"> + <input value="*.apache.org" name="sitesearch" type="hidden"/> + <input type="text" name="q" id="query" style="width:85%"> + <input type="submit" id="submit" value="Search"> + </form> + </div> <!-- lucy-top_nav_box --> + + <div class="clear"></div> + + </div> <!-- lucy-top --> + + <div id="lucy-main_content" class="container_16 lucy-white_box_3d"> + + <div class="grid_4" id="lucy-left_nav_box"> + <h6>About</h6> + <ul> + <li><a href="/">Welcome</a></li> + <li><a href="/clownfish.html">Clownfish</a></li> + <li><a href="/faq.html">FAQ</a></li> + <li><a href="/people.html">People</a></li> + </ul> + <h6>Resources</h6> + <ul> + <li><a href="/download.html">Download</a></li> + <li><a href="/mailing_lists.html">Mailing Lists</a></li> + <li><a href="/docs/perl/">Documentation</a></li> + <li><a href="http://wiki.apache.org/lucy/">Wiki</a></li> + <li><a href="https://issues.apache.org/jira/browse/LUCY">Issue Tracker</a></li> + <li><a href="/version_control.html">Version Control</a></li> + </ul> + <h6>Related Projects</h6> + <ul> + <li><a href="http://lucene.apache.org/core/">Lucene</a></li> + <li><a href="http://dezi.org/">Dezi</a></li> + <li><a href="http://lucene.apache.org/solr/">Solr</a></li> + <li><a href="http://lucenenet.apache.org/">Lucene.NET</a></li> + <li><a href="http://lucene.apache.org/pylucene/">PyLucene</a></li> + </ul> + </div> <!-- lucy-left_nav_box --> + + <div id="lucy-main_content_box" class="grid_9"> + <div> +<a name='___top' class='dummyTopAnchor' ></a> + +<h2><a class='u' +name="NAME" +>NAME</a></h2> + +<p>Lucy::Docs::Tutorial::AnalysisTutorial - How to choose and use Analyzers.</p> + +<h2><a class='u' +name="DESCRIPTION" +>DESCRIPTION</a></h2> + +<p>Try swapping out the EasyAnalyzer in our Schema for a <a href="../../../Lucy/Analysis/StandardTokenizer.html" class="podlinkpod" +>StandardTokenizer</a>:</p> + +<pre>my $tokenizer = Lucy::Analysis::StandardTokenizer->new; +my $type = Lucy::Plan::FullTextType->new( + analyzer => $tokenizer, +);</pre> + +<p>Search for <code>senate</code>, +<code>Senate</code>, +and <code>Senator</code> before and after making the change and re-indexing.</p> + +<p>Under EasyAnalyzer, +the results are identical for all three searches, +but under StandardTokenizer, +searches are case-sensitive, +and the result sets for <code>Senate</code> and <code>Senator</code> are distinct.</p> + +<h3><a class='u' +name="EasyAnalyzer" +>EasyAnalyzer</a></h3> + +<p>What’s happening is that <a href="../../../Lucy/Analysis/EasyAnalyzer.html" class="podlinkpod" +>EasyAnalyzer</a> is performing more aggressive processing than StandardTokenizer. +In addition to tokenizing, +it’s also converting all text to lower case so that searches are case-insensitive, +and using a “stemming” algorithm to reduce related words to a common stem (<code>senat</code>, +in this case).</p> + +<p>EasyAnalyzer is actually multiple Analyzers wrapped up in a single package. +In this case, +it’s three-in-one, +since specifying a EasyAnalyzer with <code>language => 'en'</code> is equivalent to this snippet creating a <a href="../../../Lucy/Analysis/PolyAnalyzer.html" class="podlinkpod" +>PolyAnalyzer</a>:</p> + +<pre>my $tokenizer = Lucy::Analysis::StandardTokenizer->new; +my $normalizer = Lucy::Analysis::Normalizer->new; +my $stemmer = Lucy::Analysis::SnowballStemmer->new( language => 'en' ); +my $polyanalyzer = Lucy::Analysis::PolyAnalyzer->new( + analyzers => [ $tokenizer, $normalizer, $stemmer ], +);</pre> + +<p>You can add or subtract Analyzers from there if you like. +Try adding a fourth Analyzer, +a SnowballStopFilter for suppressing “stopwords” like “the”, +“if”, +and “maybe”.</p> + +<pre>my $stopfilter = Lucy::Analysis::SnowballStopFilter->new( + language => 'en', +); +my $polyanalyzer = Lucy::Analysis::PolyAnalyzer->new( + analyzers => [ $tokenizer, $normalizer, $stopfilter, $stemmer ], +);</pre> + +<p>Also, +try removing the SnowballStemmer.</p> + +<pre>my $polyanalyzer = Lucy::Analysis::PolyAnalyzer->new( + analyzers => [ $tokenizer, $normalizer ], +);</pre> + +<p>The original choice of a stock English EasyAnalyzer probably still yields the best results for this document collection, +but you get the idea: sometimes you want a different Analyzer.</p> + +<h3><a class='u' +name="When_the_best_Analyzer_is_no_Analyzer" +>When the best Analyzer is no Analyzer</a></h3> + +<p>Sometimes you don’t want an Analyzer at all. +That was true for our “url” field because we didn’t need it to be searchable, +but it’s also true for certain types of searchable fields. +For instance, +“category” fields are often set up to match exactly or not at all, +as are fields like “last_name” (because you may not want to conflate results for “Humphrey” and “Humphries”).</p> + +<p>To specify that there should be no analysis performed at all, +use StringType:</p> + +<pre>my $type = Lucy::Plan::StringType->new; +$schema->spec_field( name => 'category', type => $type );</pre> + +<h3><a class='u' +name="Highlighting_up_next" +>Highlighting up next</a></h3> + +<p>In our next tutorial chapter, +<a href="../../../Lucy/Docs/Tutorial/HighlighterTutorial.html" class="podlinkpod" +>HighlighterTutorial</a>, +we’ll add highlighted excerpts from the “content” field to our search results.</p> + +</div> + + </div> <!-- lucy-main_content_box --> + <div class="clear"></div> + + </div> <!-- lucy-main_content --> + + <div id="lucy-copyright" class="container_16"> + <p>Copyright © 2010-2015 The Apache Software Foundation, Licensed under the + <a href="http://www.apache.org/licenses/LICENSE-2.0">Apache License, Version 2.0</a>. + <br/> + Apache Lucy, Lucy, Apache, the Apache feather logo, and the Apache Lucy project logo are trademarks of The + Apache Software Foundation. All other marks mentioned may be trademarks or registered trademarks of their + respective owners. + </p> + </div> <!-- lucy-copyright --> + + </div> <!-- lucy-rigid_wrapper --> + + </body> +</html> Added: websites/staging/lucy/trunk/content/docs/perl/Lucy/Docs/Tutorial/BeyondSimpleTutorial.html ============================================================================== --- websites/staging/lucy/trunk/content/docs/perl/Lucy/Docs/Tutorial/BeyondSimpleTutorial.html (added) +++ websites/staging/lucy/trunk/content/docs/perl/Lucy/Docs/Tutorial/BeyondSimpleTutorial.html Mon Apr 4 09:23:00 2016 @@ -0,0 +1,246 @@ +<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> +<html lang="en"> + <head> + <meta http-equiv="Content-Type" content="text/html;charset=UTF-8"> + <title>Lucy::Docs::Tutorial::BeyondSimpleTutorial â Apache Lucy Documentation</title> + <link rel="stylesheet" type="text/css" media="screen" href="/css/lucy.css"> + </head> + + <body> + + <div id="lucy-rigid_wrapper"> + + <div id="lucy-top" class="container_16 lucy-white_box_3d"> + + <div id="lucy-logo_box" class="grid_8"> + <a href="/"><img src="/images/lucy_logo_150x100.png" alt="Apache Lucyâ¢"></a> + </div> <!-- lucy-logo_box --> + + <div #id="lucy-top_nav_box" class="grid_8"> + <div id="lucy-top_nav_bar" class="container_8"> + <ul> + <li><a href="http://www.apache.org/" title="Apache Software Foundation">Apache Software Foundation</a></li> + <li><a href="http://www.apache.org/licenses/" title="License">License</a></li> + <li><a href="http://www.apache.org/foundation/sponsorship.html" title="Sponsorship">Sponsorship</a></li> + <li><a href="http://www.apache.org/foundation/thanks.html" title="Thanks">Thanks</a></li> + <li><a href="http://www.apache.org/security/ " title="Security">Security</a></li> + </ul> + </div> <!-- lucy-top_nav_bar --> + <p><a href="http://www.apache.org/">Apache</a> » <a href="/">Lucy</a> » <a href="/docs/">Docs</a> » <a href="/docs/perl/">Perl</a> » <a href="/docs/perl/Lucy/">Lucy</a> » <a href="/docs/perl/Lucy/Docs/">Docs</a> » <a href="/docs/perl/Lucy/Docs/Tutorial/">Tutorial</a></p> + <form name="lucy-top_search_box" id="lucy-top_search_box" action="http://www.google.com/search" method="get"> + <input value="*.apache.org" name="sitesearch" type="hidden"/> + <input type="text" name="q" id="query" style="width:85%"> + <input type="submit" id="submit" value="Search"> + </form> + </div> <!-- lucy-top_nav_box --> + + <div class="clear"></div> + + </div> <!-- lucy-top --> + + <div id="lucy-main_content" class="container_16 lucy-white_box_3d"> + + <div class="grid_4" id="lucy-left_nav_box"> + <h6>About</h6> + <ul> + <li><a href="/">Welcome</a></li> + <li><a href="/clownfish.html">Clownfish</a></li> + <li><a href="/faq.html">FAQ</a></li> + <li><a href="/people.html">People</a></li> + </ul> + <h6>Resources</h6> + <ul> + <li><a href="/download.html">Download</a></li> + <li><a href="/mailing_lists.html">Mailing Lists</a></li> + <li><a href="/docs/perl/">Documentation</a></li> + <li><a href="http://wiki.apache.org/lucy/">Wiki</a></li> + <li><a href="https://issues.apache.org/jira/browse/LUCY">Issue Tracker</a></li> + <li><a href="/version_control.html">Version Control</a></li> + </ul> + <h6>Related Projects</h6> + <ul> + <li><a href="http://lucene.apache.org/core/">Lucene</a></li> + <li><a href="http://dezi.org/">Dezi</a></li> + <li><a href="http://lucene.apache.org/solr/">Solr</a></li> + <li><a href="http://lucenenet.apache.org/">Lucene.NET</a></li> + <li><a href="http://lucene.apache.org/pylucene/">PyLucene</a></li> + </ul> + </div> <!-- lucy-left_nav_box --> + + <div id="lucy-main_content_box" class="grid_9"> + <div> +<a name='___top' class='dummyTopAnchor' ></a> + +<h2><a class='u' +name="NAME" +>NAME</a></h2> + +<p>Lucy::Docs::Tutorial::BeyondSimpleTutorial - A more flexible app structure.</p> + +<h2><a class='u' +name="DESCRIPTION" +>DESCRIPTION</a></h2> + +<h3><a class='u' +name="Goal" +>Goal</a></h3> + +<p>In this tutorial chapter, +we’ll refactor the apps we built in <a href="../../../Lucy/Docs/Tutorial/SimpleTutorial.html" class="podlinkpod" +>SimpleTutorial</a> so that they look exactly the same from the end user’s point of view, +but offer the developer greater possibilites for expansion.</p> + +<p>To achieve this, +we’ll ditch Lucy::Simple and replace it with the classes that it uses internally:</p> + +<ul> +<li><a href="../../../Lucy/Plan/Schema.html" class="podlinkpod" +>Schema</a> - Plan out your index.</li> + +<li><a href="../../../Lucy/Plan/FullTextType.html" class="podlinkpod" +>FullTextType</a> - Field type for full text search.</li> + +<li><a href="../../../Lucy/Analysis/EasyAnalyzer.html" class="podlinkpod" +>EasyAnalyzer</a> - A one-size-fits-all parser/tokenizer.</li> + +<li><a href="../../../Lucy/Index/Indexer.html" class="podlinkpod" +>Indexer</a> - Manipulate index content.</li> + +<li><a href="../../../Lucy/Search/IndexSearcher.html" class="podlinkpod" +>IndexSearcher</a> - Search an index.</li> + +<li><a href="../../../Lucy/Search/Hits.html" class="podlinkpod" +>Hits</a> - Iterate over hits returned by a Searcher.</li> +</ul> + +<h3><a class='u' +name="Adaptations_to_indexer.pl" +>Adaptations to indexer.pl</a></h3> + +<p>After we load our modules…</p> + +<pre>use Lucy::Plan::Schema; +use Lucy::Plan::FullTextType; +use Lucy::Analysis::EasyAnalyzer; +use Lucy::Index::Indexer;</pre> + +<p>… the first item we’re going need is a <a href="../../../Lucy/Plan/Schema.html" class="podlinkpod" +>Schema</a>.</p> + +<p>The primary job of a Schema is to specify what fields are available and how they’re defined. +We’ll start off with three fields: title, +content and url.</p> + +<pre># Create Schema. +my $schema = Lucy::Plan::Schema->new; +my $easyanalyzer = Lucy::Analysis::EasyAnalyzer->new( + language => 'en', +); +my $type = Lucy::Plan::FullTextType->new( + analyzer => $easyanalyzer, +); +$schema->spec_field( name => 'title', type => $type ); +$schema->spec_field( name => 'content', type => $type ); +$schema->spec_field( name => 'url', type => $type );</pre> + +<p>All of the fields are spec’d out using the <a href="../../../Lucy/Plan/FullTextType.html" class="podlinkpod" +>FullTextType</a> FieldType, +indicating that they will be searchable as “full text” – which means that they can be searched for individual words. +The “analyzer”, +which is unique to FullTextType fields, +is what breaks up the text into searchable tokens.</p> + +<p>Next, +we’ll swap our Lucy::Simple object out for an <a href="../../../Lucy/Index/Indexer.html" class="podlinkpod" +>Indexer</a>. +The substitution will be straightforward because Simple has merely been serving as a thin wrapper around an inner Indexer, +and we’ll just be peeling away the wrapper.</p> + +<p>First, +replace the constructor:</p> + +<pre># Create Indexer. +my $indexer = Lucy::Index::Indexer->new( + index => $path_to_index, + schema => $schema, + create => 1, + truncate => 1, +);</pre> + +<p>Next, +have the <code>indexer</code> object <a href="../../../Lucy/Index/Indexer.html#add_doc" class="podlinkpod" +>add_doc()</a> where we were having the <code>lucy</code> object adding the document before:</p> + +<pre>foreach my $filename (@filenames) { + my $doc = parse_file($filename); + $indexer->add_doc($doc); +}</pre> + +<p>There’s only one extra step required: at the end of the app, +you must call commit() explicitly to close the indexing session and commit your changes. +(Lucy::Simple hides this detail, +calling commit() implicitly when it needs to).</p> + +<pre>$indexer->commit;</pre> + +<h3><a class='u' +name="Adaptations_to_search.cgi" +>Adaptations to search.cgi</a></h3> + +<p>In our search app as in our indexing app, +Lucy::Simple has served as a thin wrapper – this time around <a href="../../../Lucy/Search/IndexSearcher.html" class="podlinkpod" +>IndexSearcher</a> and <a href="../../../Lucy/Search/Hits.html" class="podlinkpod" +>Hits</a>. +Swapping out Simple for these two classes is also straightforward:</p> + +<pre>use Lucy::Search::IndexSearcher; + +my $searcher = Lucy::Search::IndexSearcher->new( + index => $path_to_index, +); +my $hits = $searcher->hits( # returns a Hits object, not a hit count + query => $q, + offset => $offset, + num_wanted => $page_size, +); +my $hit_count = $hits->total_hits; # get the hit count here + +... + +while ( my $hit = $hits->next ) { + ... +}</pre> + +<h3><a class='u' +name="Hooray!" +>Hooray!</a></h3> + +<p>Congratulations! +Your apps do the same thing as before… but now they’ll be easier to customize.</p> + +<p>In our next chapter, +<a href="../../../Lucy/Docs/Tutorial/FieldTypeTutorial.html" class="podlinkpod" +>FieldTypeTutorial</a>, +we’ll explore how to assign different behaviors to different fields.</p> + +</div> + + </div> <!-- lucy-main_content_box --> + <div class="clear"></div> + + </div> <!-- lucy-main_content --> + + <div id="lucy-copyright" class="container_16"> + <p>Copyright © 2010-2015 The Apache Software Foundation, Licensed under the + <a href="http://www.apache.org/licenses/LICENSE-2.0">Apache License, Version 2.0</a>. + <br/> + Apache Lucy, Lucy, Apache, the Apache feather logo, and the Apache Lucy project logo are trademarks of The + Apache Software Foundation. All other marks mentioned may be trademarks or registered trademarks of their + respective owners. + </p> + </div> <!-- lucy-copyright --> + + </div> <!-- lucy-rigid_wrapper --> + + </body> +</html> Added: websites/staging/lucy/trunk/content/docs/perl/Lucy/Docs/Tutorial/FieldTypeTutorial.html ============================================================================== --- websites/staging/lucy/trunk/content/docs/perl/Lucy/Docs/Tutorial/FieldTypeTutorial.html (added) +++ websites/staging/lucy/trunk/content/docs/perl/Lucy/Docs/Tutorial/FieldTypeTutorial.html Mon Apr 4 09:23:00 2016 @@ -0,0 +1,169 @@ +<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> +<html lang="en"> + <head> + <meta http-equiv="Content-Type" content="text/html;charset=UTF-8"> + <title>Lucy::Docs::Tutorial::FieldTypeTutorial â Apache Lucy Documentation</title> + <link rel="stylesheet" type="text/css" media="screen" href="/css/lucy.css"> + </head> + + <body> + + <div id="lucy-rigid_wrapper"> + + <div id="lucy-top" class="container_16 lucy-white_box_3d"> + + <div id="lucy-logo_box" class="grid_8"> + <a href="/"><img src="/images/lucy_logo_150x100.png" alt="Apache Lucyâ¢"></a> + </div> <!-- lucy-logo_box --> + + <div #id="lucy-top_nav_box" class="grid_8"> + <div id="lucy-top_nav_bar" class="container_8"> + <ul> + <li><a href="http://www.apache.org/" title="Apache Software Foundation">Apache Software Foundation</a></li> + <li><a href="http://www.apache.org/licenses/" title="License">License</a></li> + <li><a href="http://www.apache.org/foundation/sponsorship.html" title="Sponsorship">Sponsorship</a></li> + <li><a href="http://www.apache.org/foundation/thanks.html" title="Thanks">Thanks</a></li> + <li><a href="http://www.apache.org/security/ " title="Security">Security</a></li> + </ul> + </div> <!-- lucy-top_nav_bar --> + <p><a href="http://www.apache.org/">Apache</a> » <a href="/">Lucy</a> » <a href="/docs/">Docs</a> » <a href="/docs/perl/">Perl</a> » <a href="/docs/perl/Lucy/">Lucy</a> » <a href="/docs/perl/Lucy/Docs/">Docs</a> » <a href="/docs/perl/Lucy/Docs/Tutorial/">Tutorial</a></p> + <form name="lucy-top_search_box" id="lucy-top_search_box" action="http://www.google.com/search" method="get"> + <input value="*.apache.org" name="sitesearch" type="hidden"/> + <input type="text" name="q" id="query" style="width:85%"> + <input type="submit" id="submit" value="Search"> + </form> + </div> <!-- lucy-top_nav_box --> + + <div class="clear"></div> + + </div> <!-- lucy-top --> + + <div id="lucy-main_content" class="container_16 lucy-white_box_3d"> + + <div class="grid_4" id="lucy-left_nav_box"> + <h6>About</h6> + <ul> + <li><a href="/">Welcome</a></li> + <li><a href="/clownfish.html">Clownfish</a></li> + <li><a href="/faq.html">FAQ</a></li> + <li><a href="/people.html">People</a></li> + </ul> + <h6>Resources</h6> + <ul> + <li><a href="/download.html">Download</a></li> + <li><a href="/mailing_lists.html">Mailing Lists</a></li> + <li><a href="/docs/perl/">Documentation</a></li> + <li><a href="http://wiki.apache.org/lucy/">Wiki</a></li> + <li><a href="https://issues.apache.org/jira/browse/LUCY">Issue Tracker</a></li> + <li><a href="/version_control.html">Version Control</a></li> + </ul> + <h6>Related Projects</h6> + <ul> + <li><a href="http://lucene.apache.org/core/">Lucene</a></li> + <li><a href="http://dezi.org/">Dezi</a></li> + <li><a href="http://lucene.apache.org/solr/">Solr</a></li> + <li><a href="http://lucenenet.apache.org/">Lucene.NET</a></li> + <li><a href="http://lucene.apache.org/pylucene/">PyLucene</a></li> + </ul> + </div> <!-- lucy-left_nav_box --> + + <div id="lucy-main_content_box" class="grid_9"> + <div> +<a name='___top' class='dummyTopAnchor' ></a> + +<h2><a class='u' +name="NAME" +>NAME</a></h2> + +<p>Lucy::Docs::Tutorial::FieldTypeTutorial - Specify per-field properties and behaviors.</p> + +<h2><a class='u' +name="DESCRIPTION" +>DESCRIPTION</a></h2> + +<p>The Schema we used in the last chapter specifies three fields:</p> + +<pre>my $type = Lucy::Plan::FullTextType->new( + analyzer => $easyanalyzer, +); +$schema->spec_field( name => 'title', type => $type ); +$schema->spec_field( name => 'content', type => $type ); +$schema->spec_field( name => 'url', type => $type );</pre> + +<p>Since they are all defined as “full text” fields, +they are all searchable – including the <code>url</code> field, +a dubious choice. +Some URLs contain meaningful information, +but these don’t, +really:</p> + +<pre>http://example.com/us_constitution/amend1.txt</pre> + +<p>We may as well not bother indexing the URL content. +To achieve that we need to assign the <code>url</code> field to a different FieldType.</p> + +<h3><a class='u' +name="StringType" +>StringType</a></h3> + +<p>Instead of FullTextType, +we’ll use a <a href="../../../Lucy/Plan/StringType.html" class="podlinkpod" +>StringType</a>, +which doesn’t use an Analyzer to break up text into individual fields. +Furthermore, +we’ll mark this StringType as unindexed, +so that its content won’t be searchable at all.</p> + +<pre>my $url_type = Lucy::Plan::StringType->new( indexed => 0 ); +$schema->spec_field( name => 'url', type => $url_type );</pre> + +<p>To observe the change in behavior, +try searching for <code>us_constitution</code> both before and after changing the Schema and re-indexing.</p> + +<h3><a class='u' +name="Toggling_(8216)stored(8217)" +>Toggling ‘stored’</a></h3> + +<p>For a taste of other FieldType possibilities, +try turning off <code>stored</code> for one or more fields.</p> + +<pre>my $content_type = Lucy::Plan::FullTextType->new( + analyzer => $easyanalyzer, + stored => 0, +);</pre> + +<p>Turning off <code>stored</code> for either <code>title</code> or <code>url</code> mangles our results page, +but since we’re not displaying <code>content</code>, +turning it off for <code>content</code> has no effect – except on index size.</p> + +<h3><a class='u' +name="Analyzers_up_next" +>Analyzers up next</a></h3> + +<p>Analyzers play a crucial role in the behavior of FullTextType fields. +In our next tutorial chapter, +<a href="../../../Lucy/Docs/Tutorial/AnalysisTutorial.html" class="podlinkpod" +>AnalysisTutorial</a>, +we’ll see how changing up the Analyzer changes search results.</p> + +</div> + + </div> <!-- lucy-main_content_box --> + <div class="clear"></div> + + </div> <!-- lucy-main_content --> + + <div id="lucy-copyright" class="container_16"> + <p>Copyright © 2010-2015 The Apache Software Foundation, Licensed under the + <a href="http://www.apache.org/licenses/LICENSE-2.0">Apache License, Version 2.0</a>. + <br/> + Apache Lucy, Lucy, Apache, the Apache feather logo, and the Apache Lucy project logo are trademarks of The + Apache Software Foundation. All other marks mentioned may be trademarks or registered trademarks of their + respective owners. + </p> + </div> <!-- lucy-copyright --> + + </div> <!-- lucy-rigid_wrapper --> + + </body> +</html> Added: websites/staging/lucy/trunk/content/docs/perl/Lucy/Docs/Tutorial/HighlighterTutorial.html ============================================================================== --- websites/staging/lucy/trunk/content/docs/perl/Lucy/Docs/Tutorial/HighlighterTutorial.html (added) +++ websites/staging/lucy/trunk/content/docs/perl/Lucy/Docs/Tutorial/HighlighterTutorial.html Mon Apr 4 09:23:00 2016 @@ -0,0 +1,164 @@ +<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> +<html lang="en"> + <head> + <meta http-equiv="Content-Type" content="text/html;charset=UTF-8"> + <title>Lucy::Docs::Tutorial::HighlighterTutorial â Apache Lucy Documentation</title> + <link rel="stylesheet" type="text/css" media="screen" href="/css/lucy.css"> + </head> + + <body> + + <div id="lucy-rigid_wrapper"> + + <div id="lucy-top" class="container_16 lucy-white_box_3d"> + + <div id="lucy-logo_box" class="grid_8"> + <a href="/"><img src="/images/lucy_logo_150x100.png" alt="Apache Lucyâ¢"></a> + </div> <!-- lucy-logo_box --> + + <div #id="lucy-top_nav_box" class="grid_8"> + <div id="lucy-top_nav_bar" class="container_8"> + <ul> + <li><a href="http://www.apache.org/" title="Apache Software Foundation">Apache Software Foundation</a></li> + <li><a href="http://www.apache.org/licenses/" title="License">License</a></li> + <li><a href="http://www.apache.org/foundation/sponsorship.html" title="Sponsorship">Sponsorship</a></li> + <li><a href="http://www.apache.org/foundation/thanks.html" title="Thanks">Thanks</a></li> + <li><a href="http://www.apache.org/security/ " title="Security">Security</a></li> + </ul> + </div> <!-- lucy-top_nav_bar --> + <p><a href="http://www.apache.org/">Apache</a> » <a href="/">Lucy</a> » <a href="/docs/">Docs</a> » <a href="/docs/perl/">Perl</a> » <a href="/docs/perl/Lucy/">Lucy</a> » <a href="/docs/perl/Lucy/Docs/">Docs</a> » <a href="/docs/perl/Lucy/Docs/Tutorial/">Tutorial</a></p> + <form name="lucy-top_search_box" id="lucy-top_search_box" action="http://www.google.com/search" method="get"> + <input value="*.apache.org" name="sitesearch" type="hidden"/> + <input type="text" name="q" id="query" style="width:85%"> + <input type="submit" id="submit" value="Search"> + </form> + </div> <!-- lucy-top_nav_box --> + + <div class="clear"></div> + + </div> <!-- lucy-top --> + + <div id="lucy-main_content" class="container_16 lucy-white_box_3d"> + + <div class="grid_4" id="lucy-left_nav_box"> + <h6>About</h6> + <ul> + <li><a href="/">Welcome</a></li> + <li><a href="/clownfish.html">Clownfish</a></li> + <li><a href="/faq.html">FAQ</a></li> + <li><a href="/people.html">People</a></li> + </ul> + <h6>Resources</h6> + <ul> + <li><a href="/download.html">Download</a></li> + <li><a href="/mailing_lists.html">Mailing Lists</a></li> + <li><a href="/docs/perl/">Documentation</a></li> + <li><a href="http://wiki.apache.org/lucy/">Wiki</a></li> + <li><a href="https://issues.apache.org/jira/browse/LUCY">Issue Tracker</a></li> + <li><a href="/version_control.html">Version Control</a></li> + </ul> + <h6>Related Projects</h6> + <ul> + <li><a href="http://lucene.apache.org/core/">Lucene</a></li> + <li><a href="http://dezi.org/">Dezi</a></li> + <li><a href="http://lucene.apache.org/solr/">Solr</a></li> + <li><a href="http://lucenenet.apache.org/">Lucene.NET</a></li> + <li><a href="http://lucene.apache.org/pylucene/">PyLucene</a></li> + </ul> + </div> <!-- lucy-left_nav_box --> + + <div id="lucy-main_content_box" class="grid_9"> + <div> +<a name='___top' class='dummyTopAnchor' ></a> + +<h2><a class='u' +name="NAME" +>NAME</a></h2> + +<p>Lucy::Docs::Tutorial::HighlighterTutorial - Augment search results with highlighted excerpts.</p> + +<h2><a class='u' +name="DESCRIPTION" +>DESCRIPTION</a></h2> + +<p>Adding relevant excerpts with highlighted search terms to your search results display makes it much easier for end users to scan the page and assess which hits look promising, +dramatically improving their search experience.</p> + +<h3><a class='u' +name="Adaptations_to_indexer.pl" +>Adaptations to indexer.pl</a></h3> + +<p><a href="../../../Lucy/Highlight/Highlighter.html" class="podlinkpod" +>Highlighter</a> uses information generated at index time. +To save resources, +highlighting is disabled by default and must be turned on for individual fields.</p> + +<pre>my $highlightable = Lucy::Plan::FullTextType->new( + analyzer => $easyanalyzer, + highlightable => 1, +); +$schema->spec_field( name => 'content', type => $highlightable );</pre> + +<h3><a class='u' +name="Adaptations_to_search.cgi" +>Adaptations to search.cgi</a></h3> + +<p>To add highlighting and excerpting to the search.cgi sample app, +create a <code>$highlighter</code> object outside the hits iterating loop…</p> + +<pre>my $highlighter = Lucy::Highlight::Highlighter->new( + searcher => $searcher, + query => $q, + field => 'content' +);</pre> + +<p>… then modify the loop and the per-hit display to generate and include the excerpt.</p> + +<pre># Create result list. +my $report = ''; +while ( my $hit = $hits->next ) { + my $score = sprintf( "%0.3f", $hit->get_score ); + my $excerpt = $highlighter->create_excerpt($hit); + $report .= qq| + <p> + <a href="$hit->{url}"><strong>$hit->{title}</strong></a> + <em>$score</em> + <br /> + $excerpt + <br /> + <span class="excerptURL">$hit->{url}</span> + </p> + |; +}</pre> + +<h3><a class='u' +name="Next_chapter:_Query_objects" +>Next chapter: Query objects</a></h3> + +<p>Our next tutorial chapter, +<a href="../../../Lucy/Docs/Tutorial/QueryObjectsTutorial.html" class="podlinkpod" +>QueryObjectsTutorial</a>, +illustrates how to build an “advanced search” interface using <a href="../../../Lucy/Search/Query.html" class="podlinkpod" +>Query</a> objects instead of query strings.</p> + +</div> + + </div> <!-- lucy-main_content_box --> + <div class="clear"></div> + + </div> <!-- lucy-main_content --> + + <div id="lucy-copyright" class="container_16"> + <p>Copyright © 2010-2015 The Apache Software Foundation, Licensed under the + <a href="http://www.apache.org/licenses/LICENSE-2.0">Apache License, Version 2.0</a>. + <br/> + Apache Lucy, Lucy, Apache, the Apache feather logo, and the Apache Lucy project logo are trademarks of The + Apache Software Foundation. All other marks mentioned may be trademarks or registered trademarks of their + respective owners. + </p> + </div> <!-- lucy-copyright --> + + </div> <!-- lucy-rigid_wrapper --> + + </body> +</html> Added: websites/staging/lucy/trunk/content/docs/perl/Lucy/Docs/Tutorial/QueryObjectsTutorial.html ============================================================================== --- websites/staging/lucy/trunk/content/docs/perl/Lucy/Docs/Tutorial/QueryObjectsTutorial.html (added) +++ websites/staging/lucy/trunk/content/docs/perl/Lucy/Docs/Tutorial/QueryObjectsTutorial.html Mon Apr 4 09:23:00 2016 @@ -0,0 +1,290 @@ +<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> +<html lang="en"> + <head> + <meta http-equiv="Content-Type" content="text/html;charset=UTF-8"> + <title>Lucy::Docs::Tutorial::QueryObjectsTutorial â Apache Lucy Documentation</title> + <link rel="stylesheet" type="text/css" media="screen" href="/css/lucy.css"> + </head> + + <body> + + <div id="lucy-rigid_wrapper"> + + <div id="lucy-top" class="container_16 lucy-white_box_3d"> + + <div id="lucy-logo_box" class="grid_8"> + <a href="/"><img src="/images/lucy_logo_150x100.png" alt="Apache Lucyâ¢"></a> + </div> <!-- lucy-logo_box --> + + <div #id="lucy-top_nav_box" class="grid_8"> + <div id="lucy-top_nav_bar" class="container_8"> + <ul> + <li><a href="http://www.apache.org/" title="Apache Software Foundation">Apache Software Foundation</a></li> + <li><a href="http://www.apache.org/licenses/" title="License">License</a></li> + <li><a href="http://www.apache.org/foundation/sponsorship.html" title="Sponsorship">Sponsorship</a></li> + <li><a href="http://www.apache.org/foundation/thanks.html" title="Thanks">Thanks</a></li> + <li><a href="http://www.apache.org/security/ " title="Security">Security</a></li> + </ul> + </div> <!-- lucy-top_nav_bar --> + <p><a href="http://www.apache.org/">Apache</a> » <a href="/">Lucy</a> » <a href="/docs/">Docs</a> » <a href="/docs/perl/">Perl</a> » <a href="/docs/perl/Lucy/">Lucy</a> » <a href="/docs/perl/Lucy/Docs/">Docs</a> » <a href="/docs/perl/Lucy/Docs/Tutorial/">Tutorial</a></p> + <form name="lucy-top_search_box" id="lucy-top_search_box" action="http://www.google.com/search" method="get"> + <input value="*.apache.org" name="sitesearch" type="hidden"/> + <input type="text" name="q" id="query" style="width:85%"> + <input type="submit" id="submit" value="Search"> + </form> + </div> <!-- lucy-top_nav_box --> + + <div class="clear"></div> + + </div> <!-- lucy-top --> + + <div id="lucy-main_content" class="container_16 lucy-white_box_3d"> + + <div class="grid_4" id="lucy-left_nav_box"> + <h6>About</h6> + <ul> + <li><a href="/">Welcome</a></li> + <li><a href="/clownfish.html">Clownfish</a></li> + <li><a href="/faq.html">FAQ</a></li> + <li><a href="/people.html">People</a></li> + </ul> + <h6>Resources</h6> + <ul> + <li><a href="/download.html">Download</a></li> + <li><a href="/mailing_lists.html">Mailing Lists</a></li> + <li><a href="/docs/perl/">Documentation</a></li> + <li><a href="http://wiki.apache.org/lucy/">Wiki</a></li> + <li><a href="https://issues.apache.org/jira/browse/LUCY">Issue Tracker</a></li> + <li><a href="/version_control.html">Version Control</a></li> + </ul> + <h6>Related Projects</h6> + <ul> + <li><a href="http://lucene.apache.org/core/">Lucene</a></li> + <li><a href="http://dezi.org/">Dezi</a></li> + <li><a href="http://lucene.apache.org/solr/">Solr</a></li> + <li><a href="http://lucenenet.apache.org/">Lucene.NET</a></li> + <li><a href="http://lucene.apache.org/pylucene/">PyLucene</a></li> + </ul> + </div> <!-- lucy-left_nav_box --> + + <div id="lucy-main_content_box" class="grid_9"> + <div> +<a name='___top' class='dummyTopAnchor' ></a> + +<h2><a class='u' +name="NAME" +>NAME</a></h2> + +<p>Lucy::Docs::Tutorial::QueryObjectsTutorial - Use Query objects instead of query strings.</p> + +<h2><a class='u' +name="DESCRIPTION" +>DESCRIPTION</a></h2> + +<p>Until now, +our search app has had only a single search box. +In this tutorial chapter, +we’ll move towards an “advanced search” interface, +by adding a “category” drop-down menu. +Three new classes will be required:</p> + +<ul> +<li><a href="../../../Lucy/Search/QueryParser.html" class="podlinkpod" +>QueryParser</a> - Turn a query string into a <a href="../../../Lucy/Search/Query.html" class="podlinkpod" +>Query</a> object.</li> + +<li><a href="../../../Lucy/Search/TermQuery.html" class="podlinkpod" +>TermQuery</a> - Query for a specific term within a specific field.</li> + +<li><a href="../../../Lucy/Search/ANDQuery.html" class="podlinkpod" +>ANDQuery</a> - “AND” together multiple Query objects to produce an intersected result set.</li> +</ul> + +<h3><a class='u' +name="Adaptations_to_indexer.pl" +>Adaptations to indexer.pl</a></h3> + +<p>Our new “category” field will be a StringType field rather than a FullTextType field, +because we will only be looking for exact matches. +It needs to be indexed, +but since we won’t display its value, +it doesn’t need to be stored.</p> + +<pre>my $cat_type = Lucy::Plan::StringType->new( stored => 0 ); +$schema->spec_field( name => 'category', type => $cat_type );</pre> + +<p>There will be three possible values: “article”, +“amendment”, +and “preamble”, +which we’ll hack out of the source file’s name during our <code>parse_file</code> subroutine:</p> + +<pre>my $category + = $filename =~ /art/ ? 'article' + : $filename =~ /amend/ ? 'amendment' + : $filename =~ /preamble/ ? 'preamble' + : die "Can't derive category for $filename"; +return { + title => $title, + content => $bodytext, + url => "/us_constitution/$filename", + category => $category, +};</pre> + +<h3><a class='u' +name="Adaptations_to_search.cgi" +>Adaptations to search.cgi</a></h3> + +<p>The “category” constraint will be added to our search interface using an HTML “select” element (this routine will need to be integrated into the HTML generation section of search.cgi):</p> + +<pre># Build up the HTML "select" object for the "category" field. +sub generate_category_select { + my $cat = shift; + my $select = qq| + <select name="category"> + <option value="">All Sections</option> + <option value="article">Articles</option> + <option value="amendment">Amendments</option> + </select>|; + if ($cat) { + $select =~ s/"$cat"/"$cat" selected/; + } + return $select; +}</pre> + +<p>We’ll start off by loading our new modules and extracting our new CGI parameter.</p> + +<pre>use Lucy::Search::QueryParser; +use Lucy::Search::TermQuery; +use Lucy::Search::ANDQuery; + +... + +my $category = decode( "UTF-8", $cgi->param('category') || '' );</pre> + +<p>QueryParser’s constructor requires a “schema” argument. +We can get that from our IndexSearcher:</p> + +<pre># Create an IndexSearcher and a QueryParser. +my $searcher = Lucy::Search::IndexSearcher->new( + index => $path_to_index, +); +my $qparser = Lucy::Search::QueryParser->new( + schema => $searcher->get_schema, +);</pre> + +<p>Previously, +we have been handing raw query strings to IndexSearcher. +Behind the scenes, +IndexSearcher has been using a QueryParser to turn those query strings into Query objects. +Now, +we will bring QueryParser into the foreground and parse the strings explicitly.</p> + +<pre>my $query = $qparser->parse($q);</pre> + +<p>If the user has specified a category, +we’ll use an ANDQuery to join our parsed query together with a TermQuery representing the category.</p> + +<pre>if ($category) { + my $category_query = Lucy::Search::TermQuery->new( + field => 'category', + term => $category, + ); + $query = Lucy::Search::ANDQuery->new( + children => [ $query, $category_query ] + ); +}</pre> + +<p>Now when we execute the query…</p> + +<pre># Execute the Query and get a Hits object. +my $hits = $searcher->hits( + query => $query, + offset => $offset, + num_wanted => $page_size, +);</pre> + +<p>… we’ll get a result set which is the intersection of the parsed query and the category query.</p> + +<h3><a class='u' +name="Using_TermQuery_with_full_text_fields" +>Using TermQuery with full text fields</a></h3> + +<p>When querying full text fields, +the easiest way is to create query objects using QueryParser. +But sometimes you want to create TermQuery for a single term in a FullTextType field directly. +In this case, +we have to run the search term through the field’s analyzer to make sure it gets normalized in the same way as the field’s content.</p> + +<pre>sub make_term_query { + my ($field, $term) = @_; + + my $token; + my $type = $schema->fetch_type($field); + + if ( $type->isa('Lucy::Plan::FullTextType') ) { + # Run the term through the full text analysis chain. + my $analyzer = $type->get_analyzer; + my $tokens = $analyzer->split($term); + + if ( @$tokens != 1 ) { + # If the term expands to more than one token, or no + # tokens at all, it will never match a token in the + # full text field. + return Lucy::Search::NoMatchQuery->new; + } + + $token = $tokens->[0]; + } + else { + # Exact match for other types. + $token = $term; + } + + return Lucy::Search::TermQuery->new( + field => $field, + term => $token, + ); +}</pre> + +<h3><a class='u' +name="Congratulations!" +>Congratulations!</a></h3> + +<p>You’ve made it to the end of the tutorial.</p> + +<h3><a class='u' +name="See_Also" +>See Also</a></h3> + +<p>For additional thematic documentation, +see the Apache Lucy <a href="../../../Lucy/Docs/Cookbook.html" class="podlinkpod" +>Cookbook</a>.</p> + +<p>ANDQuery has a companion class, +<a href="../../../Lucy/Search/ORQuery.html" class="podlinkpod" +>ORQuery</a>, +and a close relative, +<a href="../../../Lucy/Search/RequiredOptionalQuery.html" class="podlinkpod" +>RequiredOptionalQuery</a>.</p> + +</div> + + </div> <!-- lucy-main_content_box --> + <div class="clear"></div> + + </div> <!-- lucy-main_content --> + + <div id="lucy-copyright" class="container_16"> + <p>Copyright © 2010-2015 The Apache Software Foundation, Licensed under the + <a href="http://www.apache.org/licenses/LICENSE-2.0">Apache License, Version 2.0</a>. + <br/> + Apache Lucy, Lucy, Apache, the Apache feather logo, and the Apache Lucy project logo are trademarks of The + Apache Software Foundation. All other marks mentioned may be trademarks or registered trademarks of their + respective owners. + </p> + </div> <!-- lucy-copyright --> + + </div> <!-- lucy-rigid_wrapper --> + + </body> +</html> Added: websites/staging/lucy/trunk/content/docs/perl/Lucy/Docs/Tutorial/SimpleTutorial.html ============================================================================== --- websites/staging/lucy/trunk/content/docs/perl/Lucy/Docs/Tutorial/SimpleTutorial.html (added) +++ websites/staging/lucy/trunk/content/docs/perl/Lucy/Docs/Tutorial/SimpleTutorial.html Mon Apr 4 09:23:00 2016 @@ -0,0 +1,391 @@ +<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> +<html lang="en"> + <head> + <meta http-equiv="Content-Type" content="text/html;charset=UTF-8"> + <title>Lucy::Docs::Tutorial::SimpleTutorial â Apache Lucy Documentation</title> + <link rel="stylesheet" type="text/css" media="screen" href="/css/lucy.css"> + </head> + + <body> + + <div id="lucy-rigid_wrapper"> + + <div id="lucy-top" class="container_16 lucy-white_box_3d"> + + <div id="lucy-logo_box" class="grid_8"> + <a href="/"><img src="/images/lucy_logo_150x100.png" alt="Apache Lucyâ¢"></a> + </div> <!-- lucy-logo_box --> + + <div #id="lucy-top_nav_box" class="grid_8"> + <div id="lucy-top_nav_bar" class="container_8"> + <ul> + <li><a href="http://www.apache.org/" title="Apache Software Foundation">Apache Software Foundation</a></li> + <li><a href="http://www.apache.org/licenses/" title="License">License</a></li> + <li><a href="http://www.apache.org/foundation/sponsorship.html" title="Sponsorship">Sponsorship</a></li> + <li><a href="http://www.apache.org/foundation/thanks.html" title="Thanks">Thanks</a></li> + <li><a href="http://www.apache.org/security/ " title="Security">Security</a></li> + </ul> + </div> <!-- lucy-top_nav_bar --> + <p><a href="http://www.apache.org/">Apache</a> » <a href="/">Lucy</a> » <a href="/docs/">Docs</a> » <a href="/docs/perl/">Perl</a> » <a href="/docs/perl/Lucy/">Lucy</a> » <a href="/docs/perl/Lucy/Docs/">Docs</a> » <a href="/docs/perl/Lucy/Docs/Tutorial/">Tutorial</a></p> + <form name="lucy-top_search_box" id="lucy-top_search_box" action="http://www.google.com/search" method="get"> + <input value="*.apache.org" name="sitesearch" type="hidden"/> + <input type="text" name="q" id="query" style="width:85%"> + <input type="submit" id="submit" value="Search"> + </form> + </div> <!-- lucy-top_nav_box --> + + <div class="clear"></div> + + </div> <!-- lucy-top --> + + <div id="lucy-main_content" class="container_16 lucy-white_box_3d"> + + <div class="grid_4" id="lucy-left_nav_box"> + <h6>About</h6> + <ul> + <li><a href="/">Welcome</a></li> + <li><a href="/clownfish.html">Clownfish</a></li> + <li><a href="/faq.html">FAQ</a></li> + <li><a href="/people.html">People</a></li> + </ul> + <h6>Resources</h6> + <ul> + <li><a href="/download.html">Download</a></li> + <li><a href="/mailing_lists.html">Mailing Lists</a></li> + <li><a href="/docs/perl/">Documentation</a></li> + <li><a href="http://wiki.apache.org/lucy/">Wiki</a></li> + <li><a href="https://issues.apache.org/jira/browse/LUCY">Issue Tracker</a></li> + <li><a href="/version_control.html">Version Control</a></li> + </ul> + <h6>Related Projects</h6> + <ul> + <li><a href="http://lucene.apache.org/core/">Lucene</a></li> + <li><a href="http://dezi.org/">Dezi</a></li> + <li><a href="http://lucene.apache.org/solr/">Solr</a></li> + <li><a href="http://lucenenet.apache.org/">Lucene.NET</a></li> + <li><a href="http://lucene.apache.org/pylucene/">PyLucene</a></li> + </ul> + </div> <!-- lucy-left_nav_box --> + + <div id="lucy-main_content_box" class="grid_9"> + <div> +<a name='___top' class='dummyTopAnchor' ></a> + +<h2><a class='u' +name="NAME" +>NAME</a></h2> + +<p>Lucy::Docs::Tutorial::SimpleTutorial - Bare-bones search app.</p> + +<h2><a class='u' +name="DESCRIPTION" +>DESCRIPTION</a></h2> + +<h3><a class='u' +name="Setup" +>Setup</a></h3> + +<p>Copy the text presentation of the US Constitution from the <code>sample</code> directory of the Apache Lucy distribution to the base level of your web server’s <code>htdocs</code> directory.</p> + +<pre>$ cp -R sample/us_constitution /usr/local/apache2/htdocs/</pre> + +<h3><a class='u' +name="Indexing:_indexer.pl" +>Indexing: indexer.pl</a></h3> + +<p>Our first task will be to create an application called <code>indexer.pl</code> which builds a searchable “inverted index” from a collection of documents.</p> + +<p>After we specify some configuration variables and load all necessary modules…</p> + +<pre>#!/usr/local/bin/perl +use strict; +use warnings; + +# (Change configuration variables as needed.) +my $path_to_index = '/path/to/index'; +my $uscon_source = '/usr/local/apache2/htdocs/us_constitution'; + +use Lucy::Simple; +use File::Spec::Functions qw( catfile );</pre> + +<p>… we’ll start by creating a <a href="../../../Lucy/Simple.html" class="podlinkpod" +>Lucy::Simple</a> object, +telling it where we’d like the index to be located and the language of the source material.</p> + +<pre>my $lucy = Lucy::Simple->new( + path => $path_to_index, + language => 'en', +);</pre> + +<p>Next, +we’ll add a subroutine which parses our sample documents.</p> + +<pre># Parse a file from our US Constitution collection and return a hashref with +# the fields title, body, and url. +sub parse_file { + my $filename = shift; + my $filepath = catfile( $uscon_source, $filename ); + open( my $fh, '<', $filepath ) or die "Can't open '$filepath': $!"; + my $text = do { local $/; <$fh> }; # slurp file content + $text =~ /\A(.+?)^\s+(.*)/ms + or die "Can't extract title/bodytext from '$filepath'"; + my $title = $1; + my $bodytext = $2; + return { + title => $title, + content => $bodytext, + url => "/us_constitution/$filename", + }; +}</pre> + +<p>Add some elementary directory reading code…</p> + +<pre># Collect names of source files. +opendir( my $dh, $uscon_source ) + or die "Couldn't opendir '$uscon_source': $!"; +my @filenames = grep { $_ =~ /\.txt/ } readdir $dh;</pre> + +<p>… and now we’re ready for the meat of indexer.pl – which occupies exactly one line of code.</p> + +<pre>foreach my $filename (@filenames) { + my $doc = parse_file($filename); + $lucy->add_doc($doc); # ta-da! +}</pre> + +<h3><a class='u' +name="Search:_search.cgi" +>Search: search.cgi</a></h3> + +<p>As with our indexing app, +the bulk of the code in our search script won’t be Lucy-specific.</p> + +<p>The beginning is dedicated to CGI processing and configuration.</p> + +<pre>#!/usr/local/bin/perl -T +use strict; +use warnings; + +# (Change configuration variables as needed.) +my $path_to_index = '/path/to/index'; + +use CGI; +use List::Util qw( max min ); +use POSIX qw( ceil ); +use Encode qw( decode ); +use Lucy::Simple; + +my $cgi = CGI->new; +my $q = decode( "UTF-8", $cgi->param('q') || '' ); +my $offset = decode( "UTF-8", $cgi->param('offset') || 0 ); +my $page_size = 10;</pre> + +<p>Once that’s out of the way, +we create our Lucy::Simple object and feed it a query string.</p> + +<pre>my $lucy = Lucy::Simple->new( + path => $path_to_index, + language => 'en', +); +my $hit_count = $lucy->search( + query => $q, + offset => $offset, + num_wanted => $page_size, +);</pre> + +<p>The value returned by <a href="../../../Lucy/Simple.html#search" class="podlinkpod" +>search()</a> is the total number of documents in the collection which matched the query. +We’ll show this hit count to the user, +and also use it in conjunction with the parameters <code>offset</code> and <code>num_wanted</code> to break up results into “pages” of manageable size.</p> + +<p>Calling <a href="../../../Lucy/Simple.html#search" class="podlinkpod" +>search()</a> on our Simple object turns it into an iterator. +Invoking <a href="../../../Lucy/Simple.html#next" class="podlinkpod" +>next()</a> now returns hits one at a time as <a href="../../../Lucy/Document/HitDoc.html" class="podlinkpod" +>HitDoc</a> objects, +starting with the most relevant.</p> + +<pre># Create result list. +my $report = ''; +while ( my $hit = $lucy->next ) { + my $score = sprintf( "%0.3f", $hit->get_score ); + $report .= qq| + <p> + <a href="$hit->{url}"><strong>$hit->{title}</strong></a> + <em>$score</em> + <br> + <span class="excerptURL">$hit->{url}</span> + </p> + |; +}</pre> + +<p>The rest of the script is just text wrangling.</p> + +<pre>#---------------------------------------------------------------# +# No tutorial material below this point - just html generation. # +#---------------------------------------------------------------# + +# Generate paging links and hit count, print and exit. +my $paging_links = generate_paging_info( $q, $hit_count ); +blast_out_content( $q, $report, $paging_links ); + +# Create html fragment with links for paging through results n-at-a-time. +sub generate_paging_info { + my ( $query_string, $total_hits ) = @_; + my $escaped_q = CGI::escapeHTML($query_string); + my $paging_info; + if ( !length $query_string ) { + # No query? No display. + $paging_info = ''; + } + elsif ( $total_hits == 0 ) { + # Alert the user that their search failed. + $paging_info + = qq|<p>No matches for <strong>$escaped_q</strong></p>|; + } + else { + # Calculate the nums for the first and last hit to display. + my $last_result = min( ( $offset + $page_size ), $total_hits ); + my $first_result = min( ( $offset + 1 ), $last_result ); + + # Display the result nums, start paging info. + $paging_info = qq| + <p> + Results <strong>$first_result-$last_result</strong> + of <strong>$total_hits</strong> + for <strong>$escaped_q</strong>. + </p> + <p> + Results Page: + |; + + # Calculate first and last hits pages to display / link to. + my $current_page = int( $first_result / $page_size ) + 1; + my $last_page = ceil( $total_hits / $page_size ); + my $first_page = max( 1, ( $current_page - 9 ) ); + $last_page = min( $last_page, ( $current_page + 10 ) ); + + # Create a url for use in paging links. + my $href = $cgi->url( -relative => 1 ); + $href .= "?q=" . CGI::escape($query_string); + $href .= ";offset=" . CGI::escape($offset); + + # Generate the "Prev" link. + if ( $current_page > 1 ) { + my $new_offset = ( $current_page - 2 ) * $page_size; + $href =~ s/(?<=offset=)\d+/$new_offset/; + $paging_info .= qq|<a href="$href">&lt;= Prev</a>\n|; + } + + # Generate paging links. + for my $page_num ( $first_page .. $last_page ) { + if ( $page_num == $current_page ) { + $paging_info .= qq|$page_num \n|; + } + else { + my $new_offset = ( $page_num - 1 ) * $page_size; + $href =~ s/(?<=offset=)\d+/$new_offset/; + $paging_info .= qq|<a href="$href">$page_num</a>\n|; + } + } + + # Generate the "Next" link. + if ( $current_page != $last_page ) { + my $new_offset = $current_page * $page_size; + $href =~ s/(?<=offset=)\d+/$new_offset/; + $paging_info .= qq|<a href="$href">Next =&gt;</a>\n|; + } + + # Close tag. + $paging_info .= "</p>\n"; + } + + return $paging_info; +} + +# Print content to output. +sub blast_out_content { + my ( $query_string, $hit_list, $paging_info ) = @_; + my $escaped_q = CGI::escapeHTML($query_string); + binmode( STDOUT, ":encoding(UTF-8)" ); + print qq|Content-type: text/html; charset=UTF-8\n\n|; + print qq| +<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" + "http://www.w3.org/TR/html4/loose.dtd"> +<html> +<head> + <meta http-equiv="Content-type" + content="text/html;charset=UTF-8"> + <link rel="stylesheet" type="text/css" + href="/us_constitution/uscon.css"> + <title>Lucy: $escaped_q</title> +</head> + +<body> + + <div id="navigation"> + <form id="usconSearch" action=""> + <strong> + Search the + <a href="/us_constitution/index.html">US Constitution</a>: + </strong> + <input type="text" name="q" id="q" value="$escaped_q"> + <input type="submit" value="=&gt;"> + </form> + </div><!--navigation--> + + <div id="bodytext"> + + $hit_list + + $paging_info + + <p style="font-size: smaller; color: #666"> + <em> + Powered by <a href="http://lucy.apache.org/" + >Apache Lucy<small><sup>TM</sup></small></a> + </em> + </p> + </div><!--bodytext--> + +</body> + +</html> +|; +}</pre> + +<h3><a class='u' +name="OK(8230)_now_what?" +>OK… now what?</a></h3> + +<p>Lucy::Simple is perfectly adequate for some tasks, +but it’s not very flexible. +Many people find that it doesn’t do at least one or two things they can’t live without.</p> + +<p>In our next tutorial chapter, +<a href="../../../Lucy/Docs/Tutorial/BeyondSimpleTutorial.html" class="podlinkpod" +>BeyondSimpleTutorial</a>, +we’ll rewrite our indexing and search scripts using the classes that Lucy::Simple hides from view, +opening up the possibilities for expansion; then, +we’ll spend the rest of the tutorial chapters exploring these possibilities.</p> + +</div> + + </div> <!-- lucy-main_content_box --> + <div class="clear"></div> + + </div> <!-- lucy-main_content --> + + <div id="lucy-copyright" class="container_16"> + <p>Copyright © 2010-2015 The Apache Software Foundation, Licensed under the + <a href="http://www.apache.org/licenses/LICENSE-2.0">Apache License, Version 2.0</a>. + <br/> + Apache Lucy, Lucy, Apache, the Apache feather logo, and the Apache Lucy project logo are trademarks of The + Apache Software Foundation. All other marks mentioned may be trademarks or registered trademarks of their + respective owners. + </p> + </div> <!-- lucy-copyright --> + + </div> <!-- lucy-rigid_wrapper --> + + </body> +</html>
