D...

nwellnhof Wed, 28 Sep 2016 05:07:18 -0700

Added: 
lucy/site/trunk/content/docs/0.5.0/perl/Lucy/Docs/Cookbook/CustomQueryParser.mdtext
URL: 
http://svn.apache.org/viewvc/lucy/site/trunk/content/docs/0.5.0/perl/Lucy/Docs/Cookbook/CustomQueryParser.mdtext?rev=1762636&view=auto
==============================================================================
--- 
lucy/site/trunk/content/docs/0.5.0/perl/Lucy/Docs/Cookbook/CustomQueryParser.mdtext
 (added)
+++ 
lucy/site/trunk/content/docs/0.5.0/perl/Lucy/Docs/Cookbook/CustomQueryParser.mdtext
 Wed Sep 28 12:06:24 2016
@@ -0,0 +1,239 @@
+Title: Lucy::Docs::Cookbook::CustomQueryParser â Apache Lucy Documentation
+
+<div>
+<a name='___top' class='dummyTopAnchor' ></a>
+
+<h2><a class='u'
+name="NAME"
+>NAME</a></h2>
+
+<p>Lucy::Docs::Cookbook::CustomQueryParser - Sample subclass of 
QueryParser.</p>
+
+<h2><a class='u'
+name="DESCRIPTION"
+>DESCRIPTION</a></h2>
+
+<p>Implement a custom search query language using a subclass of <a 
href="../../../Lucy/Search/QueryParser.html" class="podlinkpod"
+>QueryParser</a>.</p>
+
+<h3><a class='u'
+name="The_language"
+>The language</a></h3>
+
+<p>At first,
+our query language will support only simple term queries and phrases delimited 
by double quotes.
+For simplicity&#8217;s sake,
+it will not support parenthetical groupings,
+boolean operators,
+or prepended plus/minus.
+The results for all subqueries will be unioned together &#8211; i.e.
+joined using an OR &#8211; which is usually the best approach for 
small-to-medium-sized document collections.</p>
+
+<p>Later,
+we&#8217;ll add support for trailing wildcards.</p>
+
+<h3><a class='u'
+name="Single-field_parser"
+>Single-field parser</a></h3>
+
+<p>Our initial parser implentation will generate queries against a single 
fixed field,
+&#8220;content&#8221;,
+and it will analyze text using a fixed choice of English EasyAnalyzer.
+We won&#8217;t subclass Lucy::Search::QueryParser just yet.</p>
+
+<pre>package FlatQueryParser;
+use Lucy::Search::TermQuery;
+use Lucy::Search::PhraseQuery;
+use Lucy::Search::ORQuery;
+use Carp;
+
+sub new { 
+    my $analyzer = Lucy::Analysis::EasyAnalyzer-&#62;new(
+        language =&#62; &#39;en&#39;,
+    );
+    return bless { 
+        field    =&#62; &#39;content&#39;,
+        analyzer =&#62; $analyzer,
+    }, __PACKAGE__;
+}</pre>
+
+<p>Some private helper subs for creating TermQuery and PhraseQuery objects 
will help keep the size of our main parse() subroutine down:</p>
+
+<pre>sub _make_term_query {
+    my ( $self, $term ) = @_;
+    return Lucy::Search::TermQuery-&#62;new(
+        field =&#62; $self-&#62;{field},
+        term  =&#62; $term,
+    );
+}
+
+sub _make_phrase_query {
+    my ( $self, $terms ) = @_;
+    return Lucy::Search::PhraseQuery-&#62;new(
+        field =&#62; $self-&#62;{field},
+        terms =&#62; $terms,
+    );
+}</pre>
+
+<p>Our private _tokenize() method treats double-quote delimited material as a 
single token and splits on whitespace everywhere else.</p>
+
+<pre>sub _tokenize {
+    my ( $self, $query_string ) = @_;
+    my @tokens;
+    while ( length $query_string ) {
+        if ( $query_string =~ s/^\s+// ) {
+            next;    # skip whitespace
+        }
+        elsif ( $query_string =~ s/^(&#34;[^&#34;]*(?:&#34;|$))// ) {
+            push @tokens, $1;    # double-quoted phrase
+        }
+        else {
+            $query_string =~ s/(\S+)//;
+            push @tokens, $1;    # single word
+        }
+    }
+    return \@tokens;
+}</pre>
+
+<p>The main parsing routine creates an array of tokens by calling _tokenize(),
+runs the tokens through through the EasyAnalyzer,
+creates TermQuery or PhraseQuery objects according to how many tokens emerge 
from the EasyAnalyzer&#8217;s split() method,
+and adds each of the sub-queries to the primary ORQuery.</p>
+
+<pre>sub parse {
+    my ( $self, $query_string ) = @_;
+    my $tokens   = $self-&#62;_tokenize($query_string);
+    my $analyzer = $self-&#62;{analyzer};
+    my $or_query = Lucy::Search::ORQuery-&#62;new;
+
+    for my $token (@$tokens) {
+        if ( $token =~ s/^&#34;// ) {
+            $token =~ s/&#34;$//;
+            my $terms = $analyzer-&#62;split($token);
+            my $query = $self-&#62;_make_phrase_query($terms);
+            $or_query-&#62;add_child($phrase_query);
+        }
+        else {
+            my $terms = $analyzer-&#62;split($token);
+            if ( @$terms == 1 ) {
+                my $query = $self-&#62;_make_term_query( $terms-&#62;[0] );
+                $or_query-&#62;add_child($query);
+            }
+            elsif ( @$terms &#62; 1 ) {
+                my $query = $self-&#62;_make_phrase_query($terms);
+                $or_query-&#62;add_child($query);
+            }
+        }
+    }
+
+    return $or_query;
+}</pre>
+
+<h3><a class='u'
+name="Multi-field_parser"
+>Multi-field parser</a></h3>
+
+<p>Most often,
+the end user will want their search query to match not only a single 
&#8216;content&#8217; field,
+but also &#8216;title&#8217; and so on.
+To make that happen,
+we have to turn queries such as this&#8230;</p>
+
+<pre>foo AND NOT bar</pre>
+
+<p>&#8230; into the logical equivalent of this:</p>
+
+<pre>(title:foo OR content:foo) AND NOT (title:bar OR content:bar)</pre>
+
+<p>Rather than continue with our own from-scratch parser class and write the 
routines to accomplish that expansion,
+we&#8217;re now going to subclass Lucy::Search::QueryParser and take advantage 
of some of its existing methods.</p>
+
+<p>Our first parser implementation had the &#8220;content&#8221; field name 
and the choice of English EasyAnalyzer hard-coded for simplicity,
+but we don&#8217;t need to do that once we subclass Lucy::Search::QueryParser.
+QueryParser&#8217;s constructor &#8211; which we will inherit,
+allowing us to eliminate our own constructor &#8211; requires a Schema which 
conveys field and Analyzer information,
+so we can just defer to that.</p>
+
+<pre>package FlatQueryParser;
+use base qw( Lucy::Search::QueryParser );
+use Lucy::Search::TermQuery;
+use Lucy::Search::PhraseQuery;
+use Lucy::Search::ORQuery;
+use PrefixQuery;
+use Carp;
+
+# Inherit new()</pre>
+
+<p>We&#8217;re also going to jettison our _make_term_query() and 
_make_phrase_query() helper subs and chop our parse() subroutine way down.
+Our revised parse() routine will generate Lucy::Search::LeafQuery objects 
instead of TermQueries and PhraseQueries:</p>
+
+<pre>sub parse {
+    my ( $self, $query_string ) = @_;
+    my $tokens = $self-&#62;_tokenize($query_string);
+    my $or_query = Lucy::Search::ORQuery-&#62;new;
+    for my $token (@$tokens) {
+        my $leaf_query = Lucy::Search::LeafQuery-&#62;new( text =&#62; $token 
);
+        $or_query-&#62;add_child($leaf_query);
+    }
+    return $self-&#62;expand($or_query);
+}</pre>
+
+<p>The magic happens in QueryParser&#8217;s expand() method,
+which walks the ORQuery object we supply to it looking for LeafQuery objects,
+and calls expand_leaf() for each one it finds.
+expand_leaf() performs field-specific analysis,
+decides whether each query should be a TermQuery or a PhraseQuery,
+and if multiple fields are required,
+creates an ORQuery which mults out e.g.
+<code>foo</code> into <code>(title:foo OR content:foo)</code>.</p>
+
+<h3><a class='u'
+name="Extending_the_query_language"
+>Extending the query language</a></h3>
+
+<p>To add support for trailing wildcards to our query language,
+we need to override expand_leaf() to accommodate PrefixQuery,
+while deferring to the parent class implementation on TermQuery and 
PhraseQuery.</p>
+
+<pre>sub expand_leaf {
+    my ( $self, $leaf_query ) = @_;
+    my $text = $leaf_query-&#62;get_text;
+    if ( $text =~ /\*$/ ) {
+        my $or_query = Lucy::Search::ORQuery-&#62;new;
+        for my $field ( @{ $self-&#62;get_fields } ) {
+            my $prefix_query = PrefixQuery-&#62;new(
+                field        =&#62; $field,
+                query_string =&#62; $text,
+            );
+            $or_query-&#62;add_child($prefix_query);
+        }
+        return $or_query;
+    }
+    else {
+        return $self-&#62;SUPER::expand_leaf($leaf_query);
+    }
+}</pre>
+
+<p>Ordinarily,
+those asterisks would have been stripped when running tokens through the 
EasyAnalyzer &#8211; query strings containing &#8220;foo*&#8221; would produce 
TermQueries for the term &#8220;foo&#8221;.
+Our override intercepts tokens with trailing asterisks and processes them as 
PrefixQueries before <code>SUPER::expand_leaf</code> can discard them,
+so that a search for &#8220;foo*&#8221; can match &#8220;food&#8221;,
+&#8220;foosball&#8221;,
+and so on.</p>
+
+<h3><a class='u'
+name="Usage"
+>Usage</a></h3>
+
+<p>Insert our custom parser into the search.cgi sample app to get a feel for 
how it behaves:</p>
+
+<pre>my $parser = FlatQueryParser-&#62;new( schema =&#62; 
$searcher-&#62;get_schema );
+my $query  = $parser-&#62;parse( decode( &#39;UTF-8&#39;, 
$cgi-&#62;param(&#39;q&#39;) || &#39;&#39; ) );
+my $hits   = $searcher-&#62;hits(
+    query      =&#62; $query,
+    offset     =&#62; $offset,
+    num_wanted =&#62; $page_size,
+);
+...</pre>
+
+</div>


Added: 
lucy/site/trunk/content/docs/0.5.0/perl/Lucy/Docs/Cookbook/FastUpdates.mdtext
URL: 
http://svn.apache.org/viewvc/lucy/site/trunk/content/docs/0.5.0/perl/Lucy/Docs/Cookbook/FastUpdates.mdtext?rev=1762636&view=auto
==============================================================================
--- 
lucy/site/trunk/content/docs/0.5.0/perl/Lucy/Docs/Cookbook/FastUpdates.mdtext 
(added)
+++ 
lucy/site/trunk/content/docs/0.5.0/perl/Lucy/Docs/Cookbook/FastUpdates.mdtext 
Wed Sep 28 12:06:24 2016
@@ -0,0 +1,170 @@
+Title: Lucy::Docs::Cookbook::FastUpdates â Apache Lucy Documentation
+
+<div>
+<a name='___top' class='dummyTopAnchor' ></a>
+
+<h2><a class='u'
+name="NAME"
+>NAME</a></h2>
+
+<p>Lucy::Docs::Cookbook::FastUpdates - Near real-time index updates</p>
+
+<h2><a class='u'
+name="DESCRIPTION"
+>DESCRIPTION</a></h2>
+
+<p>While index updates are fast on average,
+worst-case update performance may be significantly slower.
+To make index updates consistently quick,
+we must manually intervene to control the process of index segment 
consolidation.</p>
+
+<h3><a class='u'
+name="The_problem"
+>The problem</a></h3>
+
+<p>Ordinarily,
+modifying an index is cheap.
+New data is added to new segments,
+and the time to write a new segment scales more or less linearly with the 
number of documents added during the indexing session.</p>
+
+<p>Deletions are also cheap most of the time,
+because we don&#8217;t remove documents immediately but instead mark them as 
deleted,
+and adding the deletion mark is cheap.</p>
+
+<p>However,
+as new segments are added and the deletion rate for existing segments 
increases,
+search-time performance slowly begins to degrade.
+At some point,
+it becomes necessary to consolidate existing segments,
+rewriting their data into a new segment.</p>
+
+<p>If the recycled segments are small,
+the time it takes to rewrite them may not be significant.
+Every once in a while,
+though,
+a large amount of data must be rewritten.</p>
+
+<h3><a class='u'
+name="Procrastinating_and_playing_catch-up"
+>Procrastinating and playing catch-up</a></h3>
+
+<p>The simplest way to force fast index updates is to avoid rewriting 
anything.</p>
+
+<p>Indexer relies upon <a href="../../../Lucy/Index/IndexManager.html" 
class="podlinkpod"
+>IndexManager</a>&#8217;s <a 
href="../../../Lucy/Index/IndexManager.html#recycle" class="podlinkpod"
+>recycle()</a> method to tell it which segments should be consolidated.
+If we subclass IndexManager and override the method so that it always returns 
an empty array,
+we get consistently quick performance:</p>
+
+<pre>package NoMergeManager;
+use base qw( Lucy::Index::IndexManager );
+sub recycle { [] }
+
+package main;
+my $indexer = Lucy::Index::Indexer-&#62;new(
+    index =&#62; &#39;/path/to/index&#39;,
+    manager =&#62; NoMergeManager-&#62;new,
+);
+...
+$indexer-&#62;commit;</pre>
+
+<p>However,
+we can&#8217;t procrastinate forever.
+Eventually,
+we&#8217;ll have to run an ordinary,
+uncontrolled indexing session,
+potentially triggering a large rewrite of lots of small and/or degraded 
segments:</p>
+
+<pre>my $indexer = Lucy::Index::Indexer-&#62;new( 
+    index =&#62; &#39;/path/to/index&#39;, 
+    # manager =&#62; NoMergeManager-&#62;new,
+);
+...
+$indexer-&#62;commit;</pre>
+
+<h3><a class='u'
+name="Acceptable_worst-case_update_time,_slower_degradation"
+>Acceptable worst-case update time,
+slower degradation</a></h3>
+
+<p>Never merging anything at all in the main indexing process is probably 
overkill.
+Small segments are relatively cheap to merge; we just need to guard against 
the big rewrites.</p>
+
+<p>Setting a ceiling on the number of documents in the segments to be recycled 
allows us to avoid a mass proliferation of tiny,
+single-document segments,
+while still offering decent worst-case update speed:</p>
+
+<pre>package LightMergeManager;
+use base qw( Lucy::Index::IndexManager );
+
+sub recycle {
+    my $self = shift;
+    my $seg_readers = $self-&#62;SUPER::recycle(@_);
+    @$seg_readers = grep { $_-&#62;doc_max &#60; 10 } @$seg_readers;
+    return $seg_readers;
+}</pre>
+
+<p>However,
+we still have to consolidate every once in a while,
+and while that happens content updates will be locked out.</p>
+
+<h3><a class='u'
+name="Background_merging"
+>Background merging</a></h3>
+
+<p>If it&#8217;s not acceptable to lock out updates while the index 
consolidation process runs,
+the alternative is to move the consolidation process out of band,
+using <a href="../../../Lucy/Index/BackgroundMerger.html" class="podlinkpod"
+>BackgroundMerger</a>.</p>
+
+<p>It&#8217;s never safe to have more than one Indexer attempting to modify 
the content of an index at the same time,
+but a BackgroundMerger and an Indexer can operate simultaneously:</p>
+
+<pre># Indexing process.
+use Scalar::Util qw( blessed );
+my $retries = 0;
+while (1) {
+    eval {
+        my $indexer = Lucy::Index::Indexer-&#62;new(
+                index =&#62; &#39;/path/to/index&#39;,
+                manager =&#62; LightMergeManager-&#62;new,
+            );
+        $indexer-&#62;add_doc($doc);
+        $indexer-&#62;commit;
+    };
+    last unless $@;
+    if ( blessed($@) and $@-&#62;isa(&#34;Lucy::Store::LockErr&#34;) ) {
+        # Catch LockErr.
+        warn &#34;Couldn&#39;t get lock ($retries retries)&#34;;
+        $retries++;
+    }
+    else {
+        die &#34;Write failed: $@&#34;;
+    }
+}
+
+# Background merge process.
+my $manager = Lucy::Index::IndexManager-&#62;new;
+$manager-&#62;set_write_lock_timeout(60_000);
+my $bg_merger = Lucy::Index::BackgroundMerger-&#62;new(
+    index   =&#62; &#39;/path/to/index&#39;,
+    manager =&#62; $manager,
+);
+$bg_merger-&#62;commit;</pre>
+
+<p>The exception handling code becomes useful once you have more than one 
index modification process happening simultaneously.
+By default,
+Indexer tries several times to acquire a write lock over the span of one 
second,
+then holds it until <a href="../../../Lucy/Index/Indexer.html#commit" 
class="podlinkpod"
+>commit()</a> completes.
+BackgroundMerger handles most of its work without the write lock,
+but it does need it briefly once at the beginning and once again near the end.
+Under normal loads,
+the internal retry logic will resolve conflicts,
+but if it&#8217;s not acceptable to miss an insert,
+you probably want to catch <a href="../../../Lucy/Store/LockErr.html" 
class="podlinkpod"
+>LockErr</a> exceptions thrown by Indexer.
+In contrast,
+a LockErr from BackgroundMerger probably just needs to be logged.</p>
+
+</div>

Added: lucy/site/trunk/content/docs/0.5.0/perl/Lucy/Docs/DevGuide.mdtext
URL: 
http://svn.apache.org/viewvc/lucy/site/trunk/content/docs/0.5.0/perl/Lucy/Docs/DevGuide.mdtext?rev=1762636&view=auto
==============================================================================
--- lucy/site/trunk/content/docs/0.5.0/perl/Lucy/Docs/DevGuide.mdtext (added)
+++ lucy/site/trunk/content/docs/0.5.0/perl/Lucy/Docs/DevGuide.mdtext Wed Sep 
28 12:06:24 2016
@@ -0,0 +1,54 @@
+Title: Lucy::Docs::DevGuide â Apache Lucy Documentation
+
+<div>
+<a name='___top' class='dummyTopAnchor' ></a>
+
+<h2><a class='u'
+name="NAME"
+>NAME</a></h2>
+
+<p>Lucy::Docs::DevGuide - Quick-start guide to hacking on Apache Lucy.</p>
+
+<h2><a class='u'
+name="DESCRIPTION"
+>DESCRIPTION</a></h2>
+
+<p>The Apache Lucy code base is organized into roughly four layers:</p>
+
+<ul>
+<li>Charmonizer - compiler and OS configuration probing.</li>
+
+<li>Clownfish - header files.</li>
+
+<li>C - implementation files.</li>
+
+<li>Host - binding language.</li>
+</ul>
+
+<p>Charmonizer is a configuration prober which writes a single header file,
+&#8220;charmony.h&#8221;,
+describing the build environment and facilitating cross-platform development.
+It&#8217;s similar to Autoconf or Metaconfig,
+but written in pure C.</p>
+
+<p>The &#8220;.cfh&#8221; files within the Lucy core are Clownfish header 
files.
+Clownfish is a purpose-built,
+declaration-only language which superimposes a single-inheritance object model 
on top of C which is specifically designed to co-exist happily with variety of 
&#8220;host&#8221; languages and to allow limited run-time dynamic subclassing.
+For more information see the Clownfish docs,
+but if there&#8217;s one thing you should know about Clownfish OO before you 
start hacking,
+it&#8217;s that method calls are differentiated from functions by 
capitalization:</p>
+
+<pre>Indexer_Add_Doc   &#60;-- Method, typically uses dynamic dispatch.
+Indexer_add_doc   &#60;-- Function, always a direct invocation.</pre>
+
+<p>The C files within the Lucy core are where most of Lucy&#8217;s low-level 
functionality lies.
+They implement the interface defined by the Clownfish header files.</p>
+
+<p>The C core is intentionally left incomplete,
+however; to be usable,
+it must be bound to a &#8220;host&#8221; language.
+(In this context,
+even C is considered a &#8220;host&#8221; which must implement the missing 
pieces and be &#8220;bound&#8221; to the core.) Some of the binding code is 
autogenerated by Clownfish on a spec customized for each language.
+Other pieces are hand-coded in either C (using the host&#8217;s C API) or the 
host language itself.</p>
+
+</div>

Added: lucy/site/trunk/content/docs/0.5.0/perl/Lucy/Docs/DocIDs.mdtext
URL: 
http://svn.apache.org/viewvc/lucy/site/trunk/content/docs/0.5.0/perl/Lucy/Docs/DocIDs.mdtext?rev=1762636&view=auto
==============================================================================
--- lucy/site/trunk/content/docs/0.5.0/perl/Lucy/Docs/DocIDs.mdtext (added)
+++ lucy/site/trunk/content/docs/0.5.0/perl/Lucy/Docs/DocIDs.mdtext Wed Sep 28 
12:06:24 2016
@@ -0,0 +1,47 @@
+Title: Lucy::Docs::DocIDs â Apache Lucy Documentation
+
+<div>
+<a name='___top' class='dummyTopAnchor' ></a>
+
+<h2><a class='u'
+name="NAME"
+>NAME</a></h2>
+
+<p>Lucy::Docs::DocIDs - Characteristics of Apache Lucy document ids.</p>
+
+<h2><a class='u'
+name="DESCRIPTION"
+>DESCRIPTION</a></h2>
+
+<h3><a class='u'
+name="Document_ids_are_signed_32-bit_integers"
+>Document ids are signed 32-bit integers</a></h3>
+
+<p>Document ids in Apache Lucy start at 1.
+Because 0 is never a valid doc id,
+we can use it as a sentinel value:</p>
+
+<pre>while ( my $doc_id = $posting_list-&#62;next ) {
+    ...
+}</pre>
+
+<h3><a class='u'
+name="Document_ids_are_ephemeral"
+>Document ids are ephemeral</a></h3>
+
+<p>The document ids used by Lucy are associated with a single index snapshot.
+The moment an index is updated,
+the mapping of document ids to documents is subject to change.</p>
+
+<p>Since IndexReader objects represent a point-in-time view of an index,
+document ids are guaranteed to remain static for the life of the reader.
+However,
+because they are not permanent,
+Lucy document ids cannot be used as foreign keys to locate records in external 
data sources.
+If you truly need a primary key field,
+you must define it and populate it yourself.</p>
+
+<p>Furthermore,
+the order of document ids does not tell you anything about the sequence in 
which documents were added to the index.</p>
+
+</div>

Added: lucy/site/trunk/content/docs/0.5.0/perl/Lucy/Docs/FileFormat.mdtext
URL: 
http://svn.apache.org/viewvc/lucy/site/trunk/content/docs/0.5.0/perl/Lucy/Docs/FileFormat.mdtext?rev=1762636&view=auto
==============================================================================
--- lucy/site/trunk/content/docs/0.5.0/perl/Lucy/Docs/FileFormat.mdtext (added)
+++ lucy/site/trunk/content/docs/0.5.0/perl/Lucy/Docs/FileFormat.mdtext Wed Sep 
28 12:06:24 2016
@@ -0,0 +1,270 @@
+Title: Lucy::Docs::FileFormat â Apache Lucy Documentation
+
+<div>
+<a name='___top' class='dummyTopAnchor' ></a>
+
+<h2><a class='u'
+name="NAME"
+>NAME</a></h2>
+
+<p>Lucy::Docs::FileFormat - Overview of index file format</p>
+
+<h2><a class='u'
+name="DESCRIPTION"
+>DESCRIPTION</a></h2>
+
+<p>It is not necessary to understand the current implementation details of the 
index file format in order to use Apache Lucy effectively,
+but it may be helpful if you are interested in tweaking for high performance,
+exotic usage,
+or debugging and development.</p>
+
+<p>On a file system,
+an index is a directory.
+The files inside have a hierarchical relationship: an index is made up of 
&#8220;segments&#8221;,
+each of which is an independent inverted index with its own subdirectory; each 
segment is made up of several component parts.</p>
+
+<pre>[index]--|
+         |--snapshot_XXX.json
+         |--schema_XXX.json
+         |--write.lock
+         |
+         |--seg_1--|
+         |         |--segmeta.json
+         |         |--cfmeta.json
+         |         |--cf.dat-------|
+         |                         |--[lexicon]
+         |                         |--[postings]
+         |                         |--[documents]
+         |                         |--[highlight]
+         |                         |--[deletions]
+         |
+         |--seg_2--|
+         |         |--segmeta.json
+         |         |--cfmeta.json
+         |         |--cf.dat-------|
+         |                         |--[lexicon]
+         |                         |--[postings]
+         |                         |--[documents]
+         |                         |--[highlight]
+         |                         |--[deletions]
+         |
+         |--[...]--| </pre>
+
+<h3><a class='u'
+name="Write-once_philosophy"
+>Write-once philosophy</a></h3>
+
+<p>All segment directory names consist of the string &#8220;seg_&#8221; 
followed by a number in base 36: seg_1,
+seg_5m,
+seg_p9s2 and so on,
+with higher numbers indicating more recent segments.
+Once a segment is finished and committed,
+its name is never re-used and its files are never modified.</p>
+
+<p>Old segments become obsolete and can be removed when their data has been 
consolidated into new segments during the process of segment merging and 
optimization.
+A fully-optimized index has only one segment.</p>
+
+<h3><a class='u'
+name="Top-level_entries"
+>Top-level entries</a></h3>
+
+<p>There are a handful of &#8220;top-level&#8221; files and directories which 
belong to the entire index rather than to a particular segment.</p>
+
+<h4><a class='u'
+name="snapshot_XXX.json"
+>snapshot_XXX.json</a></h4>
+
+<p>A &#8220;snapshot&#8221; file,
+e.g.
+<code>snapshot_m7p.json</code>,
+is list of index files and directories.
+Because index files,
+once written,
+are never modified,
+the list of entries in a snapshot defines a point-in-time view of the data in 
an index.</p>
+
+<p>Like segment directories,
+snapshot files also utilize the unique-base-36-number naming convention; the 
higher the number,
+the more recent the file.
+The appearance of a new snapshot file within the index directory constitutes 
an index update.
+While a new segment is being written new files may be added to the index 
directory,
+but until a new snapshot file gets written,
+a Searcher opening the index for reading won&#8217;t know about them.</p>
+
+<h4><a class='u'
+name="schema_XXX.json"
+>schema_XXX.json</a></h4>
+
+<p>The schema file is a Schema object describing the index&#8217;s format,
+serialized as JSON.
+It,
+too,
+is versioned,
+and a given snapshot file will reference one and only one schema file.</p>
+
+<h4><a class='u'
+name="locks"
+>locks</a></h4>
+
+<p>By default,
+only one indexing process may safely modify the index at any given time.
+Processes reserve an index by laying claim to the <code>write.lock</code> file 
within the <code>locks/</code> directory.
+A smattering of other lock files may be used from time to time,
+as well.</p>
+
+<h3><a class='u'
+name="A_segment(8217)s_component_parts"
+>A segment&#8217;s component parts</a></h3>
+
+<p>By default,
+each segment has up to five logical components: lexicon,
+postings,
+document storage,
+highlight data,
+and deletions.
+Binary data from these components gets stored in virtual files within the 
&#8220;cf.dat&#8221; compound file; metadata is stored in a shared 
&#8220;segmeta.json&#8221; file.</p>
+
+<h4><a class='u'
+name="segmeta.json"
+>segmeta.json</a></h4>
+
+<p>The segmeta.json file is a central repository for segment metadata.
+In addition to information such as document counts and field numbers,
+it also warehouses arbitrary metadata on behalf of individual index 
components.</p>
+
+<h4><a class='u'
+name="Lexicon"
+>Lexicon</a></h4>
+
+<p>Each indexed field gets its own lexicon in each segment.
+The exact files involved depend on the field&#8217;s type,
+but generally speaking there will be two parts.
+First,
+there&#8217;s a primary <code>lexicon-XXX.dat</code> file which houses a 
complete term list associating terms with corpus frequency statistics,
+postings file locations,
+etc.
+Second,
+one or more &#8220;lexicon index&#8221; files may be present which contain 
periodic samples from the primary lexicon file to facilitate fast lookups.</p>
+
+<h4><a class='u'
+name="Postings"
+>Postings</a></h4>
+
+<p>&#8220;Posting&#8221; is a technical term from the field of <a 
href="../../Lucy/Docs/IRTheory.html" class="podlinkpod"
+>information retrieval</a>,
+defined as a single instance of a one term indexing one document.
+If you are looking at the index in the back of a book,
+and you see that &#8220;freedom&#8221; is referenced on pages 8,
+86,
+and 240,
+that would be three postings,
+which taken together form a &#8220;posting list&#8221;.
+The same terminology applies to an index in electronic form.</p>
+
+<p>Each segment has one postings file per indexed field.
+When a search is performed for a single term,
+first that term is looked up in the lexicon.
+If the term exists in the segment,
+the record in the lexicon will contain information about which postings file 
to look at and where to look.</p>
+
+<p>The first thing any posting record tells you is a document id.
+By iterating over all the postings associated with a term,
+you can find all the documents that match that term,
+a process which is analogous to looking up page numbers in a book&#8217;s 
index.
+However,
+each posting record typically contains other information in addition to 
document id,
+e.g.
+the positions at which the term occurs within the field.</p>
+
+<h4><a class='u'
+name="Documents"
+>Documents</a></h4>
+
+<p>The document storage section is a simple database,
+organized into two files:</p>
+
+<ul>
+<li><b>documents.dat</b> - Serialized documents.</li>
+
+<li><b>documents.ix</b> - Document storage index,
+a solid array of 64-bit integers where each integer location corresponds to a 
document id,
+and the value at that location points at a file position in the documents.dat 
file.</li>
+</ul>
+
+<h4><a class='u'
+name="Highlight_data"
+>Highlight data</a></h4>
+
+<p>The files which store data used for excerpting and highlighting are 
organized similarly to the files used to store documents.</p>
+
+<ul>
+<li><b>highlight.dat</b> - Chunks of serialized highlight data,
+one per doc id.</li>
+
+<li><b>highlight.ix</b> - Highlight data index &#8211; as with the 
<code>documents.ix</code> file,
+a solid array of 64-bit file pointers.</li>
+</ul>
+
+<h4><a class='u'
+name="Deletions"
+>Deletions</a></h4>
+
+<p>When a document is &#8220;deleted&#8221; from a segment,
+it is not actually purged right away; it is merely marked as 
&#8220;deleted&#8221; via a deletions file.
+Deletions files contains bit vectors with one bit for each document in the 
segment; if bit #254 is set then document 254 is deleted,
+and if that document turns up in a search it will be masked out.</p>
+
+<p>It is only when a segment&#8217;s contents are rewritten to a new segment 
during the segment-merging process that deleted documents truly go away.</p>
+
+<h3><a class='u'
+name="Compound_Files"
+>Compound Files</a></h3>
+
+<p>If you peer inside an index directory,
+you won&#8217;t actually find any files named &#8220;documents.dat&#8221;,
+&#8220;highlight.ix&#8221;,
+etc.
+unless there is an indexing process underway.
+What you will find instead is one &#8220;cf.dat&#8221; and one 
&#8220;cfmeta.json&#8221; file per segment.</p>
+
+<p>To minimize the need for file descriptors at search-time,
+all per-segment binary data files are concatenated together in 
&#8220;cf.dat&#8221; at the close of each indexing session.
+Information about where each file begins and ends is stored in 
<code>cfmeta.json</code>.
+When the segment is opened for reading,
+a single file descriptor per &#8220;cf.dat&#8221; file can be shared among 
several readers.</p>
+
+<h3><a class='u'
+name="A_Typical_Search"
+>A Typical Search</a></h3>
+
+<p>Here&#8217;s a simplified narrative,
+dramatizing how a search for &#8220;freedom&#8221; against a given segment 
plays out:</p>
+
+<ul>
+<li>The searcher asks the relevant Lexicon Index,
+&#8220;Do you know anything about &#8216;freedom&#8217;?&#8221; Lexicon Index 
replies,
+&#8220;Can&#8217;t say for sure,
+but if the main Lexicon file does,
+&#8216;freedom&#8217; is probably somewhere around byte 21008&#8221;.</li>
+
+<li>The main Lexicon tells the searcher &#8220;One moment,
+let me scan our records&#8230; Yes,
+we have 2 documents which contain &#8216;freedom&#8217;.
+You&#8217;ll find them in seg_6/postings-4.dat starting at byte 
66991.&#8221;</li>
+
+<li>The Postings file says &#8220;Yep,
+we have &#8216;freedom&#8217;,
+all right!
+Document id 40 has 1 &#8216;freedom&#8217;,
+and document 44 has 8.
+If you need to know more,
+like if any &#8216;freedom&#8217; is part of the phrase &#8216;freedom of 
speech&#8217;,
+ask me about positions!</li>
+
+<li>If the searcher is only looking for &#8216;freedom&#8217; in isolation,
+that&#8217;s where it stops.
+It now knows enough to assign the documents scores against 
&#8220;freedom&#8221;,
+with the 8-freedom document likely ranking higher than the single-freedom 
document.</li>
+</ul>
+
+</div>

Added: lucy/site/trunk/content/docs/0.5.0/perl/Lucy/Docs/FileLocking.mdtext
URL: 
http://svn.apache.org/viewvc/lucy/site/trunk/content/docs/0.5.0/perl/Lucy/Docs/FileLocking.mdtext?rev=1762636&view=auto
==============================================================================
--- lucy/site/trunk/content/docs/0.5.0/perl/Lucy/Docs/FileLocking.mdtext (added)
+++ lucy/site/trunk/content/docs/0.5.0/perl/Lucy/Docs/FileLocking.mdtext Wed 
Sep 28 12:06:24 2016
@@ -0,0 +1,93 @@
+Title: Lucy::Docs::FileLocking â Apache Lucy Documentation
+
+<div>
+<a name='___top' class='dummyTopAnchor' ></a>
+
+<h2><a class='u'
+name="NAME"
+>NAME</a></h2>
+
+<p>Lucy::Docs::FileLocking - Manage indexes on shared volumes.</p>
+
+<h2><a class='u'
+name="DESCRIPTION"
+>DESCRIPTION</a></h2>
+
+<p>Normally,
+index locking is an invisible process.
+Exclusive write access is controlled via lockfiles within the index directory 
and problems only arise if multiple processes attempt to acquire the write lock 
simultaneously; search-time processes do not ordinarily require locking at 
all.</p>
+
+<p>On shared volumes,
+however,
+the default locking mechanism fails,
+and manual intervention becomes necessary.</p>
+
+<p>Both read and write applications accessing an index on a shared volume need 
to identify themselves with a unique <code>host</code> id,
+e.g.
+hostname or ip address.
+Knowing the host id makes it possible to tell which lockfiles belong to other 
machines and therefore must not be removed when the lockfile&#8217;s pid number 
appears not to correspond to an active process.</p>
+
+<p>At index-time,
+the danger is that multiple indexing processes from different machines which 
fail to specify a unique <code>host</code> id can delete each others&#8217; 
lockfiles and then attempt to modify the index at the same time,
+causing index corruption.
+The search-time problem is more complex.</p>
+
+<p>Once an index file is no longer listed in the most recent snapshot,
+Indexer attempts to delete it as part of a post-<a href="lucy:Indexer.Commit" 
class="podlinkurl"
+>lucy:Indexer.Commit</a> cleanup routine.
+It is possible that at the moment an Indexer is deleting files which it 
believes no longer needed,
+a Searcher referencing an earlier snapshot is in fact using them.
+The more often that an index is either updated or searched,
+the more likely it is that this conflict will arise from time to time.</p>
+
+<p>Ordinarily,
+the deletion attempts are not a problem.
+On a typical unix volume,
+the files will be deleted in name only: any process which holds an open 
filehandle against a given file will continue to have access,
+and the file won&#8217;t actually get vaporized until the last filehandle is 
cleared.
+Thanks to &#8220;delete on last close semantics&#8221;,
+an Indexer can&#8217;t truly delete the file out from underneath an active 
Searcher.
+On Windows,
+where file deletion fails whenever any process holds an open handle,
+the situation is different but still workable: Indexer just keeps retrying 
after each commit until deletion finally succeeds.</p>
+
+<p>On NFS,
+however,
+the system breaks,
+because NFS allows files to be deleted out from underneath active processes.
+Should this happen,
+the unlucky read process will crash with a &#8220;Stale NFS filehandle&#8221; 
exception.</p>
+
+<p>Under normal circumstances,
+it is neither necessary nor desirable for IndexReaders to secure read locks 
against an index,
+but for NFS we have to make an exception.
+LockFactory&#8217;s <a href="lucy:LockFactory.Make_Shared_Lock" 
class="podlinkurl"
+>lucy:LockFactory.Make_Shared_Lock</a> method exists for this reason; 
supplying an IndexManager instance to IndexReader&#8217;s constructor activates 
an internal locking mechanism using <a href="lucy:LockFactory.Make_Shared_Lock" 
class="podlinkurl"
+>lucy:LockFactory.Make_Shared_Lock</a> which prevents concurrent indexing 
processes from deleting files that are needed by active readers.</p>
+
+<pre>use Sys::Hostname qw( hostname );
+my $hostname = hostname() or die &#34;Can&#39;t get unique hostname&#34;;
+my $manager = Lucy::Index::IndexManager-&#62;new( host =&#62; $hostname );
+
+# Index time:
+my $indexer = Lucy::Index::Indexer-&#62;new(
+    index   =&#62; &#39;/path/to/index&#39;,
+    manager =&#62; $manager,
+);
+
+# Search time:
+my $reader = Lucy::Index::IndexReader-&#62;open(
+    index   =&#62; &#39;/path/to/index&#39;,
+    manager =&#62; $manager,
+);
+my $searcher = Lucy::Search::IndexSearcher-&#62;new( index =&#62; $reader 
);</pre>
+
+<p>Since shared locks are implemented using lockfiles located in the index 
directory (as are exclusive locks),
+reader applications must have write access for read locking to work.
+Stale lock files from crashed processes are ordinarily cleared away the next 
time the same machine &#8211; as identified by the <code>host</code> parameter 
&#8211; opens another IndexReader.
+(The classic technique of timing out lock files is not feasible because search 
processes may lie dormant indefinitely.) However,
+please be aware that if the last thing a given machine does is crash,
+lock files belonging to it may persist,
+preventing deletion of obsolete index data.</p>
+
+</div>

Added: lucy/site/trunk/content/docs/0.5.0/perl/Lucy/Docs/IRTheory.mdtext
URL: 
http://svn.apache.org/viewvc/lucy/site/trunk/content/docs/0.5.0/perl/Lucy/Docs/IRTheory.mdtext?rev=1762636&view=auto
==============================================================================
--- lucy/site/trunk/content/docs/0.5.0/perl/Lucy/Docs/IRTheory.mdtext (added)
+++ lucy/site/trunk/content/docs/0.5.0/perl/Lucy/Docs/IRTheory.mdtext Wed Sep 
28 12:06:24 2016
@@ -0,0 +1,69 @@
+Title: Lucy::Docs::IRTheory â Apache Lucy Documentation
+
+<div>
+<a name='___top' class='dummyTopAnchor' ></a>
+
+<h2><a class='u'
+name="NAME"
+>NAME</a></h2>
+
+<p>Lucy::Docs::IRTheory - Crash course in information retrieval</p>
+
+<h2><a class='u'
+name="DESCRIPTION"
+>DESCRIPTION</a></h2>
+
+<p>Just enough Information Retrieval theory to find your way around Apache 
Lucy.</p>
+
+<h3><a class='u'
+name="Terminology"
+>Terminology</a></h3>
+
+<p>Lucy uses some terminology from the field of information retrieval which 
may be unfamiliar to many users.
+&#8220;Document&#8221; and &#8220;term&#8221; mean pretty much what 
you&#8217;d expect them to,
+but others such as &#8220;posting&#8221; and &#8220;inverted index&#8221; need 
a formal introduction:</p>
+
+<ul>
+<li><i>document</i> - An atomic unit of retrieval.</li>
+
+<li><i>term</i> - An attribute which describes a document.</li>
+
+<li><i>posting</i> - One term indexing one document.</li>
+
+<li><i>term list</i> - The complete list of terms which describe a 
document.</li>
+
+<li><i>posting list</i> - The complete list of documents which a term 
indexes.</li>
+
+<li><i>inverted index</i> - A data structure which maps from terms to 
documents.</li>
+</ul>
+
+<p>Since Lucy is a practical implementation of IR theory,
+it loads these abstract,
+distilled definitions down with useful traits.
+For instance,
+a &#8220;posting&#8221; in its most rarefied form is simply a term-document 
pairing; in Lucy,
+the class MatchPosting fills this role.
+However,
+by associating additional information with a posting like the number of times 
the term occurs in the document,
+we can turn it into a ScorePosting,
+making it possible to rank documents by relevance rather than just list 
documents which happen to match in no particular order.</p>
+
+<h3><a class='u'
+name="TF/IDF_ranking_algorithm"
+>TF/IDF ranking algorithm</a></h3>
+
+<p>Lucy uses a variant of the well-established &#8220;Term Frequency / Inverse 
Document Frequency&#8221; weighting scheme.
+A thorough treatment of TF/IDF is too ambitious for our present purposes,
+but in a nutshell,
+it means that&#8230;</p>
+
+<ul>
+<li>in a search for <code>skate park</code>,
+documents which score well for the comparatively rare term <code>skate</code> 
will rank higher than documents which score well for the more common term 
<code>park</code>.</li>
+
+<li>a 10-word text which has one occurrence each of both <code>skate</code> 
and <code>park</code> will rank higher than a 1000-word text which also 
contains one occurrence of each.</li>
+</ul>
+
+<p>A web search for &#8220;tf idf&#8221; will turn up many excellent 
explanations of the algorithm.</p>
+
+</div>

Added: lucy/site/trunk/content/docs/0.5.0/perl/Lucy/Docs/Tutorial.mdtext
URL: 
http://svn.apache.org/viewvc/lucy/site/trunk/content/docs/0.5.0/perl/Lucy/Docs/Tutorial.mdtext?rev=1762636&view=auto
==============================================================================
--- lucy/site/trunk/content/docs/0.5.0/perl/Lucy/Docs/Tutorial.mdtext (added)
+++ lucy/site/trunk/content/docs/0.5.0/perl/Lucy/Docs/Tutorial.mdtext Wed Sep 
28 12:06:24 2016
@@ -0,0 +1,77 @@
+Title: Lucy::Docs::Tutorial â Apache Lucy Documentation
+
+<div>
+<a name='___top' class='dummyTopAnchor' ></a>
+
+<h2><a class='u'
+name="NAME"
+>NAME</a></h2>
+
+<p>Lucy::Docs::Tutorial - Step-by-step introduction to Apache Lucy.</p>
+
+<h2><a class='u'
+name="DESCRIPTION"
+>DESCRIPTION</a></h2>
+
+<p>Explore Apache Lucy&#8217;s basic functionality by starting with a 
minimalist CGI search app based on Lucy::Simple and transforming it,
+step by step,
+into an &#8220;advanced search&#8221; interface utilizing more flexible core 
modules like <a href="../../Lucy/Index/Indexer.html" class="podlinkpod"
+>Indexer</a> and <a href="../../Lucy/Search/IndexSearcher.html" 
class="podlinkpod"
+>IndexSearcher</a>.</p>
+
+<h3><a class='u'
+name="Chapters"
+>Chapters</a></h3>
+
+<ul>
+<li><a href="../../Lucy/Docs/Tutorial/SimpleTutorial.html" class="podlinkpod"
+>SimpleTutorial</a> - Build a bare-bones search app using Lucy::Simple.</li>
+
+<li><a href="../../Lucy/Docs/Tutorial/BeyondSimpleTutorial.html" 
class="podlinkpod"
+>BeyondSimpleTutorial</a> - Rebuild the app using core classes like <a 
href="../../Lucy/Index/Indexer.html" class="podlinkpod"
+>Indexer</a> and <a href="../../Lucy/Search/IndexSearcher.html" 
class="podlinkpod"
+>IndexSearcher</a> in place of Lucy::Simple.</li>
+
+<li><a href="../../Lucy/Docs/Tutorial/FieldTypeTutorial.html" 
class="podlinkpod"
+>FieldTypeTutorial</a> - Experiment with different field characteristics using 
subclasses of <a href="../../Lucy/Plan/FieldType.html" class="podlinkpod"
+>FieldType</a>.</li>
+
+<li><a href="../../Lucy/Docs/Tutorial/AnalysisTutorial.html" class="podlinkpod"
+>AnalysisTutorial</a> - Examine how the choice of <a 
href="../../Lucy/Analysis/Analyzer.html" class="podlinkpod"
+>Analyzer</a> subclass affects search results.</li>
+
+<li><a href="../../Lucy/Docs/Tutorial/HighlighterTutorial.html" 
class="podlinkpod"
+>HighlighterTutorial</a> - Augment search results with highlighted 
excerpts.</li>
+
+<li><a href="../../Lucy/Docs/Tutorial/QueryObjectsTutorial.html" 
class="podlinkpod"
+>QueryObjectsTutorial</a> - Unlock advanced search features by using Query 
objects instead of query strings.</li>
+</ul>
+
+<h3><a class='u'
+name="Source_materials"
+>Source materials</a></h3>
+
+<p>The source material used by the tutorial app &#8211; a multi-text-file 
presentation of the United States constitution &#8211; can be found in the 
<code>sample</code> directory at the root of the Lucy distribution,
+along with finished indexing and search apps.</p>
+
+<pre>sample/indexer.pl        # indexing app
+sample/search.cgi        # search app
+sample/us_constitution   # corpus</pre>
+
+<h3><a class='u'
+name="Conventions"
+>Conventions</a></h3>
+
+<p>The user is expected to be familiar with OO Perl and basic CGI 
programming.</p>
+
+<p>The code in this tutorial assumes a Unix-flavored operating system and the 
Apache webserver,
+but will work with minor modifications on other setups.</p>
+
+<h3><a class='u'
+name="See_also"
+>See also</a></h3>
+
+<p>More advanced and esoteric subjects are covered in <a 
href="../../Lucy/Docs/Cookbook.html" class="podlinkpod"
+>Cookbook</a>.</p>
+
+</div>

Added: 
lucy/site/trunk/content/docs/0.5.0/perl/Lucy/Docs/Tutorial/AnalysisTutorial.mdtext
URL: 
http://svn.apache.org/viewvc/lucy/site/trunk/content/docs/0.5.0/perl/Lucy/Docs/Tutorial/AnalysisTutorial.mdtext?rev=1762636&view=auto
==============================================================================
--- 
lucy/site/trunk/content/docs/0.5.0/perl/Lucy/Docs/Tutorial/AnalysisTutorial.mdtext
 (added)
+++ 
lucy/site/trunk/content/docs/0.5.0/perl/Lucy/Docs/Tutorial/AnalysisTutorial.mdtext
 Wed Sep 28 12:06:24 2016
@@ -0,0 +1,107 @@
+Title: Lucy::Docs::Tutorial::AnalysisTutorial â Apache Lucy Documentation
+
+<div>
+<a name='___top' class='dummyTopAnchor' ></a>
+
+<h2><a class='u'
+name="NAME"
+>NAME</a></h2>
+
+<p>Lucy::Docs::Tutorial::AnalysisTutorial - How to choose and use 
Analyzers.</p>
+
+<h2><a class='u'
+name="DESCRIPTION"
+>DESCRIPTION</a></h2>
+
+<p>Try swapping out the EasyAnalyzer in our Schema for a <a 
href="../../../Lucy/Analysis/StandardTokenizer.html" class="podlinkpod"
+>StandardTokenizer</a>:</p>
+
+<pre>my $tokenizer = Lucy::Analysis::StandardTokenizer-&#62;new;
+my $type = Lucy::Plan::FullTextType-&#62;new(
+    analyzer =&#62; $tokenizer,
+);</pre>
+
+<p>Search for <code>senate</code>,
+<code>Senate</code>,
+and <code>Senator</code> before and after making the change and 
re-indexing.</p>
+
+<p>Under EasyAnalyzer,
+the results are identical for all three searches,
+but under StandardTokenizer,
+searches are case-sensitive,
+and the result sets for <code>Senate</code> and <code>Senator</code> are 
distinct.</p>
+
+<h3><a class='u'
+name="EasyAnalyzer"
+>EasyAnalyzer</a></h3>
+
+<p>What&#8217;s happening is that <a 
href="../../../Lucy/Analysis/EasyAnalyzer.html" class="podlinkpod"
+>EasyAnalyzer</a> is performing more aggressive processing than 
StandardTokenizer.
+In addition to tokenizing,
+it&#8217;s also converting all text to lower case so that searches are 
case-insensitive,
+and using a &#8220;stemming&#8221; algorithm to reduce related words to a 
common stem (<code>senat</code>,
+in this case).</p>
+
+<p>EasyAnalyzer is actually multiple Analyzers wrapped up in a single package.
+In this case,
+it&#8217;s three-in-one,
+since specifying a EasyAnalyzer with <code>language =&#62; &#39;en&#39;</code> 
is equivalent to this snippet creating a <a 
href="../../../Lucy/Analysis/PolyAnalyzer.html" class="podlinkpod"
+>PolyAnalyzer</a>:</p>
+
+<pre>my $tokenizer    = Lucy::Analysis::StandardTokenizer-&#62;new;
+my $normalizer   = Lucy::Analysis::Normalizer-&#62;new;
+my $stemmer      = Lucy::Analysis::SnowballStemmer-&#62;new( language =&#62; 
&#39;en&#39; );
+my $polyanalyzer = Lucy::Analysis::PolyAnalyzer-&#62;new(
+    analyzers =&#62; [ $tokenizer, $normalizer, $stemmer ],
+);</pre>
+
+<p>You can add or subtract Analyzers from there if you like.
+Try adding a fourth Analyzer,
+a SnowballStopFilter for suppressing &#8220;stopwords&#8221; like 
&#8220;the&#8221;,
+&#8220;if&#8221;,
+and &#8220;maybe&#8221;.</p>
+
+<pre>my $stopfilter = Lucy::Analysis::SnowballStopFilter-&#62;new( 
+    language =&#62; &#39;en&#39;,
+);
+my $polyanalyzer = Lucy::Analysis::PolyAnalyzer-&#62;new(
+    analyzers =&#62; [ $tokenizer, $normalizer, $stopfilter, $stemmer ],
+);</pre>
+
+<p>Also,
+try removing the SnowballStemmer.</p>
+
+<pre>my $polyanalyzer = Lucy::Analysis::PolyAnalyzer-&#62;new(
+    analyzers =&#62; [ $tokenizer, $normalizer ],
+);</pre>
+
+<p>The original choice of a stock English EasyAnalyzer probably still yields 
the best results for this document collection,
+but you get the idea: sometimes you want a different Analyzer.</p>
+
+<h3><a class='u'
+name="When_the_best_Analyzer_is_no_Analyzer"
+>When the best Analyzer is no Analyzer</a></h3>
+
+<p>Sometimes you don&#8217;t want an Analyzer at all.
+That was true for our &#8220;url&#8221; field because we didn&#8217;t need it 
to be searchable,
+but it&#8217;s also true for certain types of searchable fields.
+For instance,
+&#8220;category&#8221; fields are often set up to match exactly or not at all,
+as are fields like &#8220;last_name&#8221; (because you may not want to 
conflate results for &#8220;Humphrey&#8221; and &#8220;Humphries&#8221;).</p>
+
+<p>To specify that there should be no analysis performed at all,
+use StringType:</p>
+
+<pre>my $type = Lucy::Plan::StringType-&#62;new;
+$schema-&#62;spec_field( name =&#62; &#39;category&#39;, type =&#62; $type 
);</pre>
+
+<h3><a class='u'
+name="Highlighting_up_next"
+>Highlighting up next</a></h3>
+
+<p>In our next tutorial chapter,
+<a href="../../../Lucy/Docs/Tutorial/HighlighterTutorial.html" 
class="podlinkpod"
+>HighlighterTutorial</a>,
+we&#8217;ll add highlighted excerpts from the &#8220;content&#8221; field to 
our search results.</p>
+
+</div>

Added: 
lucy/site/trunk/content/docs/0.5.0/perl/Lucy/Docs/Tutorial/BeyondSimpleTutorial.mdtext
URL: 
http://svn.apache.org/viewvc/lucy/site/trunk/content/docs/0.5.0/perl/Lucy/Docs/Tutorial/BeyondSimpleTutorial.mdtext?rev=1762636&view=auto
==============================================================================
--- 
lucy/site/trunk/content/docs/0.5.0/perl/Lucy/Docs/Tutorial/BeyondSimpleTutorial.mdtext
 (added)
+++ 
lucy/site/trunk/content/docs/0.5.0/perl/Lucy/Docs/Tutorial/BeyondSimpleTutorial.mdtext
 Wed Sep 28 12:06:24 2016
@@ -0,0 +1,158 @@
+Title: Lucy::Docs::Tutorial::BeyondSimpleTutorial â Apache Lucy Documentation
+
+<div>
+<a name='___top' class='dummyTopAnchor' ></a>
+
+<h2><a class='u'
+name="NAME"
+>NAME</a></h2>
+
+<p>Lucy::Docs::Tutorial::BeyondSimpleTutorial - A more flexible app 
structure.</p>
+
+<h2><a class='u'
+name="DESCRIPTION"
+>DESCRIPTION</a></h2>
+
+<h3><a class='u'
+name="Goal"
+>Goal</a></h3>
+
+<p>In this tutorial chapter,
+we&#8217;ll refactor the apps we built in <a 
href="../../../Lucy/Docs/Tutorial/SimpleTutorial.html" class="podlinkpod"
+>SimpleTutorial</a> so that they look exactly the same from the end 
user&#8217;s point of view,
+but offer the developer greater possibilites for expansion.</p>
+
+<p>To achieve this,
+we&#8217;ll ditch Lucy::Simple and replace it with the classes that it uses 
internally:</p>
+
+<ul>
+<li><a href="../../../Lucy/Plan/Schema.html" class="podlinkpod"
+>Schema</a> - Plan out your index.</li>
+
+<li><a href="../../../Lucy/Plan/FullTextType.html" class="podlinkpod"
+>FullTextType</a> - Field type for full text search.</li>
+
+<li><a href="../../../Lucy/Analysis/EasyAnalyzer.html" class="podlinkpod"
+>EasyAnalyzer</a> - A one-size-fits-all parser/tokenizer.</li>
+
+<li><a href="../../../Lucy/Index/Indexer.html" class="podlinkpod"
+>Indexer</a> - Manipulate index content.</li>
+
+<li><a href="../../../Lucy/Search/IndexSearcher.html" class="podlinkpod"
+>IndexSearcher</a> - Search an index.</li>
+
+<li><a href="../../../Lucy/Search/Hits.html" class="podlinkpod"
+>Hits</a> - Iterate over hits returned by a Searcher.</li>
+</ul>
+
+<h3><a class='u'
+name="Adaptations_to_indexer.pl"
+>Adaptations to indexer.pl</a></h3>
+
+<p>After we load our modules&#8230;</p>
+
+<pre>use Lucy::Plan::Schema;
+use Lucy::Plan::FullTextType;
+use Lucy::Analysis::EasyAnalyzer;
+use Lucy::Index::Indexer;</pre>
+
+<p>&#8230; the first item we&#8217;re going need is a <a 
href="../../../Lucy/Plan/Schema.html" class="podlinkpod"
+>Schema</a>.</p>
+
+<p>The primary job of a Schema is to specify what fields are available and how 
they&#8217;re defined.
+We&#8217;ll start off with three fields: title,
+content and url.</p>
+
+<pre># Create Schema.
+my $schema = Lucy::Plan::Schema-&#62;new;
+my $easyanalyzer = Lucy::Analysis::EasyAnalyzer-&#62;new(
+    language =&#62; &#39;en&#39;,
+);
+my $type = Lucy::Plan::FullTextType-&#62;new(
+    analyzer =&#62; $easyanalyzer,
+);
+$schema-&#62;spec_field( name =&#62; &#39;title&#39;,   type =&#62; $type );
+$schema-&#62;spec_field( name =&#62; &#39;content&#39;, type =&#62; $type );
+$schema-&#62;spec_field( name =&#62; &#39;url&#39;,     type =&#62; $type 
);</pre>
+
+<p>All of the fields are spec&#8217;d out using the <a 
href="../../../Lucy/Plan/FullTextType.html" class="podlinkpod"
+>FullTextType</a> FieldType,
+indicating that they will be searchable as &#8220;full text&#8221; &#8211; 
which means that they can be searched for individual words.
+The &#8220;analyzer&#8221;,
+which is unique to FullTextType fields,
+is what breaks up the text into searchable tokens.</p>
+
+<p>Next,
+we&#8217;ll swap our Lucy::Simple object out for an <a 
href="../../../Lucy/Index/Indexer.html" class="podlinkpod"
+>Indexer</a>.
+The substitution will be straightforward because Simple has merely been 
serving as a thin wrapper around an inner Indexer,
+and we&#8217;ll just be peeling away the wrapper.</p>
+
+<p>First,
+replace the constructor:</p>
+
+<pre># Create Indexer.
+my $indexer = Lucy::Index::Indexer-&#62;new(
+    index    =&#62; $path_to_index,
+    schema   =&#62; $schema,
+    create   =&#62; 1,
+    truncate =&#62; 1,
+);</pre>
+
+<p>Next,
+have the <code>indexer</code> object <a 
href="../../../Lucy/Index/Indexer.html#add_doc" class="podlinkpod"
+>add_doc()</a> where we were having the <code>lucy</code> object adding the 
document before:</p>
+
+<pre>foreach my $filename (@filenames) {
+    my $doc = parse_file($filename);
+    $indexer-&#62;add_doc($doc);
+}</pre>
+
+<p>There&#8217;s only one extra step required: at the end of the app,
+you must call commit() explicitly to close the indexing session and commit 
your changes.
+(Lucy::Simple hides this detail,
+calling commit() implicitly when it needs to).</p>
+
+<pre>$indexer-&#62;commit;</pre>
+
+<h3><a class='u'
+name="Adaptations_to_search.cgi"
+>Adaptations to search.cgi</a></h3>
+
+<p>In our search app as in our indexing app,
+Lucy::Simple has served as a thin wrapper &#8211; this time around <a 
href="../../../Lucy/Search/IndexSearcher.html" class="podlinkpod"
+>IndexSearcher</a> and <a href="../../../Lucy/Search/Hits.html" 
class="podlinkpod"
+>Hits</a>.
+Swapping out Simple for these two classes is also straightforward:</p>
+
+<pre>use Lucy::Search::IndexSearcher;
+
+my $searcher = Lucy::Search::IndexSearcher-&#62;new( 
+    index =&#62; $path_to_index,
+);
+my $hits = $searcher-&#62;hits(    # returns a Hits object, not a hit count
+    query      =&#62; $q,
+    offset     =&#62; $offset,
+    num_wanted =&#62; $page_size,
+);
+my $hit_count = $hits-&#62;total_hits;  # get the hit count here
+
+...
+
+while ( my $hit = $hits-&#62;next ) {
+    ...
+}</pre>
+
+<h3><a class='u'
+name="Hooray!"
+>Hooray!</a></h3>
+
+<p>Congratulations!
+Your apps do the same thing as before&#8230; but now they&#8217;ll be easier 
to customize.</p>
+
+<p>In our next chapter,
+<a href="../../../Lucy/Docs/Tutorial/FieldTypeTutorial.html" class="podlinkpod"
+>FieldTypeTutorial</a>,
+we&#8217;ll explore how to assign different behaviors to different fields.</p>
+
+</div>

Added: 
lucy/site/trunk/content/docs/0.5.0/perl/Lucy/Docs/Tutorial/FieldTypeTutorial.mdtext
URL: 
http://svn.apache.org/viewvc/lucy/site/trunk/content/docs/0.5.0/perl/Lucy/Docs/Tutorial/FieldTypeTutorial.mdtext?rev=1762636&view=auto
==============================================================================
--- 
lucy/site/trunk/content/docs/0.5.0/perl/Lucy/Docs/Tutorial/FieldTypeTutorial.mdtext
 (added)
+++ 
lucy/site/trunk/content/docs/0.5.0/perl/Lucy/Docs/Tutorial/FieldTypeTutorial.mdtext
 Wed Sep 28 12:06:24 2016
@@ -0,0 +1,81 @@
+Title: Lucy::Docs::Tutorial::FieldTypeTutorial â Apache Lucy Documentation
+
+<div>
+<a name='___top' class='dummyTopAnchor' ></a>
+
+<h2><a class='u'
+name="NAME"
+>NAME</a></h2>
+
+<p>Lucy::Docs::Tutorial::FieldTypeTutorial - Specify per-field properties and 
behaviors.</p>
+
+<h2><a class='u'
+name="DESCRIPTION"
+>DESCRIPTION</a></h2>
+
+<p>The Schema we used in the last chapter specifies three fields:</p>
+
+<pre>my $type = Lucy::Plan::FullTextType-&#62;new(
+    analyzer =&#62; $easyanalyzer,
+);
+$schema-&#62;spec_field( name =&#62; &#39;title&#39;,   type =&#62; $type );
+$schema-&#62;spec_field( name =&#62; &#39;content&#39;, type =&#62; $type );
+$schema-&#62;spec_field( name =&#62; &#39;url&#39;,     type =&#62; $type 
);</pre>
+
+<p>Since they are all defined as &#8220;full text&#8221; fields,
+they are all searchable &#8211; including the <code>url</code> field,
+a dubious choice.
+Some URLs contain meaningful information,
+but these don&#8217;t,
+really:</p>
+
+<pre>http://example.com/us_constitution/amend1.txt</pre>
+
+<p>We may as well not bother indexing the URL content.
+To achieve that we need to assign the <code>url</code> field to a different 
FieldType.</p>
+
+<h3><a class='u'
+name="StringType"
+>StringType</a></h3>
+
+<p>Instead of FullTextType,
+we&#8217;ll use a <a href="../../../Lucy/Plan/StringType.html" 
class="podlinkpod"
+>StringType</a>,
+which doesn&#8217;t use an Analyzer to break up text into individual fields.
+Furthermore,
+we&#8217;ll mark this StringType as unindexed,
+so that its content won&#8217;t be searchable at all.</p>
+
+<pre>my $url_type = Lucy::Plan::StringType-&#62;new( indexed =&#62; 0 );
+$schema-&#62;spec_field( name =&#62; &#39;url&#39;, type =&#62; $url_type 
);</pre>
+
+<p>To observe the change in behavior,
+try searching for <code>us_constitution</code> both before and after changing 
the Schema and re-indexing.</p>
+
+<h3><a class='u'
+name="Toggling_(8216)stored(8217)"
+>Toggling &#8216;stored&#8217;</a></h3>
+
+<p>For a taste of other FieldType possibilities,
+try turning off <code>stored</code> for one or more fields.</p>
+
+<pre>my $content_type = Lucy::Plan::FullTextType-&#62;new(
+    analyzer =&#62; $easyanalyzer,
+    stored   =&#62; 0,
+);</pre>
+
+<p>Turning off <code>stored</code> for either <code>title</code> or 
<code>url</code> mangles our results page,
+but since we&#8217;re not displaying <code>content</code>,
+turning it off for <code>content</code> has no effect &#8211; except on index 
size.</p>
+
+<h3><a class='u'
+name="Analyzers_up_next"
+>Analyzers up next</a></h3>
+
+<p>Analyzers play a crucial role in the behavior of FullTextType fields.
+In our next tutorial chapter,
+<a href="../../../Lucy/Docs/Tutorial/AnalysisTutorial.html" class="podlinkpod"
+>AnalysisTutorial</a>,
+we&#8217;ll see how changing up the Analyzer changes search results.</p>
+
+</div>

Added: 
lucy/site/trunk/content/docs/0.5.0/perl/Lucy/Docs/Tutorial/HighlighterTutorial.mdtext
URL: 
http://svn.apache.org/viewvc/lucy/site/trunk/content/docs/0.5.0/perl/Lucy/Docs/Tutorial/HighlighterTutorial.mdtext?rev=1762636&view=auto
==============================================================================
--- 
lucy/site/trunk/content/docs/0.5.0/perl/Lucy/Docs/Tutorial/HighlighterTutorial.mdtext
 (added)
+++ 
lucy/site/trunk/content/docs/0.5.0/perl/Lucy/Docs/Tutorial/HighlighterTutorial.mdtext
 Wed Sep 28 12:06:24 2016
@@ -0,0 +1,76 @@
+Title: Lucy::Docs::Tutorial::HighlighterTutorial â Apache Lucy Documentation
+
+<div>
+<a name='___top' class='dummyTopAnchor' ></a>
+
+<h2><a class='u'
+name="NAME"
+>NAME</a></h2>
+
+<p>Lucy::Docs::Tutorial::HighlighterTutorial - Augment search results with 
highlighted excerpts.</p>
+
+<h2><a class='u'
+name="DESCRIPTION"
+>DESCRIPTION</a></h2>
+
+<p>Adding relevant excerpts with highlighted search terms to your search 
results display makes it much easier for end users to scan the page and assess 
which hits look promising,
+dramatically improving their search experience.</p>
+
+<h3><a class='u'
+name="Adaptations_to_indexer.pl"
+>Adaptations to indexer.pl</a></h3>
+
+<p><a href="../../../Lucy/Highlight/Highlighter.html" class="podlinkpod"
+>Highlighter</a> uses information generated at index time.
+To save resources,
+highlighting is disabled by default and must be turned on for individual 
fields.</p>
+
+<pre>my $highlightable = Lucy::Plan::FullTextType-&#62;new(
+    analyzer      =&#62; $easyanalyzer,
+    highlightable =&#62; 1,
+);
+$schema-&#62;spec_field( name =&#62; &#39;content&#39;, type =&#62; 
$highlightable );</pre>
+
+<h3><a class='u'
+name="Adaptations_to_search.cgi"
+>Adaptations to search.cgi</a></h3>
+
+<p>To add highlighting and excerpting to the search.cgi sample app,
+create a <code>$highlighter</code> object outside the hits iterating 
loop&#8230;</p>
+
+<pre>my $highlighter = Lucy::Highlight::Highlighter-&#62;new(
+    searcher =&#62; $searcher,
+    query    =&#62; $q,
+    field    =&#62; &#39;content&#39;
+);</pre>
+
+<p>&#8230; then modify the loop and the per-hit display to generate and 
include the excerpt.</p>
+
+<pre># Create result list.
+my $report = &#39;&#39;;
+while ( my $hit = $hits-&#62;next ) {
+    my $score   = sprintf( &#34;%0.3f&#34;, $hit-&#62;get_score );
+    my $excerpt = $highlighter-&#62;create_excerpt($hit);
+    $report .= qq|
+        &#60;p&#62;
+          &#60;a 
href=&#34;$hit-&#62;{url}&#34;&#62;&#60;strong&#62;$hit-&#62;{title}&#60;/strong&#62;&#60;/a&#62;
+          &#60;em&#62;$score&#60;/em&#62;
+          &#60;br /&#62;
+          $excerpt
+          &#60;br /&#62;
+          &#60;span 
class=&#34;excerptURL&#34;&#62;$hit-&#62;{url}&#60;/span&#62;
+        &#60;/p&#62;
+    |;
+}</pre>
+
+<h3><a class='u'
+name="Next_chapter:_Query_objects"
+>Next chapter: Query objects</a></h3>
+
+<p>Our next tutorial chapter,
+<a href="../../../Lucy/Docs/Tutorial/QueryObjectsTutorial.html" 
class="podlinkpod"
+>QueryObjectsTutorial</a>,
+illustrates how to build an &#8220;advanced search&#8221; interface using <a 
href="../../../Lucy/Search/Query.html" class="podlinkpod"
+>Query</a> objects instead of query strings.</p>
+
+</div>

Added: 
lucy/site/trunk/content/docs/0.5.0/perl/Lucy/Docs/Tutorial/QueryObjectsTutorial.mdtext
URL: 
http://svn.apache.org/viewvc/lucy/site/trunk/content/docs/0.5.0/perl/Lucy/Docs/Tutorial/QueryObjectsTutorial.mdtext?rev=1762636&view=auto
==============================================================================
--- 
lucy/site/trunk/content/docs/0.5.0/perl/Lucy/Docs/Tutorial/QueryObjectsTutorial.mdtext
 (added)
+++ 
lucy/site/trunk/content/docs/0.5.0/perl/Lucy/Docs/Tutorial/QueryObjectsTutorial.mdtext
 Wed Sep 28 12:06:24 2016
@@ -0,0 +1,202 @@
+Title: Lucy::Docs::Tutorial::QueryObjectsTutorial â Apache Lucy Documentation
+
+<div>
+<a name='___top' class='dummyTopAnchor' ></a>
+
+<h2><a class='u'
+name="NAME"
+>NAME</a></h2>
+
+<p>Lucy::Docs::Tutorial::QueryObjectsTutorial - Use Query objects instead of 
query strings.</p>
+
+<h2><a class='u'
+name="DESCRIPTION"
+>DESCRIPTION</a></h2>
+
+<p>Until now,
+our search app has had only a single search box.
+In this tutorial chapter,
+we&#8217;ll move towards an &#8220;advanced search&#8221; interface,
+by adding a &#8220;category&#8221; drop-down menu.
+Three new classes will be required:</p>
+
+<ul>
+<li><a href="../../../Lucy/Search/QueryParser.html" class="podlinkpod"
+>QueryParser</a> - Turn a query string into a <a 
href="../../../Lucy/Search/Query.html" class="podlinkpod"
+>Query</a> object.</li>
+
+<li><a href="../../../Lucy/Search/TermQuery.html" class="podlinkpod"
+>TermQuery</a> - Query for a specific term within a specific field.</li>
+
+<li><a href="../../../Lucy/Search/ANDQuery.html" class="podlinkpod"
+>ANDQuery</a> - &#8220;AND&#8221; together multiple Query objects to produce 
an intersected result set.</li>
+</ul>
+
+<h3><a class='u'
+name="Adaptations_to_indexer.pl"
+>Adaptations to indexer.pl</a></h3>
+
+<p>Our new &#8220;category&#8221; field will be a StringType field rather than 
a FullTextType field,
+because we will only be looking for exact matches.
+It needs to be indexed,
+but since we won&#8217;t display its value,
+it doesn&#8217;t need to be stored.</p>
+
+<pre>my $cat_type = Lucy::Plan::StringType-&#62;new( stored =&#62; 0 );
+$schema-&#62;spec_field( name =&#62; &#39;category&#39;, type =&#62; $cat_type 
);</pre>
+
+<p>There will be three possible values: &#8220;article&#8221;,
+&#8220;amendment&#8221;,
+and &#8220;preamble&#8221;,
+which we&#8217;ll hack out of the source file&#8217;s name during our 
<code>parse_file</code> subroutine:</p>
+
+<pre>my $category
+    = $filename =~ /art/      ? &#39;article&#39;
+    : $filename =~ /amend/    ? &#39;amendment&#39;
+    : $filename =~ /preamble/ ? &#39;preamble&#39;
+    :                           die &#34;Can&#39;t derive category for 
$filename&#34;;
+return {
+    title    =&#62; $title,
+    content  =&#62; $bodytext,
+    url      =&#62; &#34;/us_constitution/$filename&#34;,
+    category =&#62; $category,
+};</pre>
+
+<h3><a class='u'
+name="Adaptations_to_search.cgi"
+>Adaptations to search.cgi</a></h3>
+
+<p>The &#8220;category&#8221; constraint will be added to our search interface 
using an HTML &#8220;select&#8221; element (this routine will need to be 
integrated into the HTML generation section of search.cgi):</p>
+
+<pre># Build up the HTML &#34;select&#34; object for the &#34;category&#34; 
field.
+sub generate_category_select {
+    my $cat = shift;
+    my $select = qq|
+      &#60;select name=&#34;category&#34;&#62;
+        &#60;option value=&#34;&#34;&#62;All Sections&#60;/option&#62;
+        &#60;option value=&#34;article&#34;&#62;Articles&#60;/option&#62;
+        &#60;option value=&#34;amendment&#34;&#62;Amendments&#60;/option&#62;
+      &#60;/select&#62;|;
+    if ($cat) {
+        $select =~ s/&#34;$cat&#34;/&#34;$cat&#34; selected/;
+    }
+    return $select;
+}</pre>
+
+<p>We&#8217;ll start off by loading our new modules and extracting our new CGI 
parameter.</p>
+
+<pre>use Lucy::Search::QueryParser;
+use Lucy::Search::TermQuery;
+use Lucy::Search::ANDQuery;
+
+... 
+
+my $category = decode( &#34;UTF-8&#34;, $cgi-&#62;param(&#39;category&#39;) || 
&#39;&#39; );</pre>
+
+<p>QueryParser&#8217;s constructor requires a &#8220;schema&#8221; argument.
+We can get that from our IndexSearcher:</p>
+
+<pre># Create an IndexSearcher and a QueryParser.
+my $searcher = Lucy::Search::IndexSearcher-&#62;new( 
+    index =&#62; $path_to_index, 
+);
+my $qparser  = Lucy::Search::QueryParser-&#62;new( 
+    schema =&#62; $searcher-&#62;get_schema,
+);</pre>
+
+<p>Previously,
+we have been handing raw query strings to IndexSearcher.
+Behind the scenes,
+IndexSearcher has been using a QueryParser to turn those query strings into 
Query objects.
+Now,
+we will bring QueryParser into the foreground and parse the strings 
explicitly.</p>
+
+<pre>my $query = $qparser-&#62;parse($q);</pre>
+
+<p>If the user has specified a category,
+we&#8217;ll use an ANDQuery to join our parsed query together with a TermQuery 
representing the category.</p>
+
+<pre>if ($category) {
+    my $category_query = Lucy::Search::TermQuery-&#62;new(
+        field =&#62; &#39;category&#39;, 
+        term  =&#62; $category,
+    );
+    $query = Lucy::Search::ANDQuery-&#62;new(
+        children =&#62; [ $query, $category_query ]
+    );
+}</pre>
+
+<p>Now when we execute the query&#8230;</p>
+
+<pre># Execute the Query and get a Hits object.
+my $hits = $searcher-&#62;hits(
+    query      =&#62; $query,
+    offset     =&#62; $offset,
+    num_wanted =&#62; $page_size,
+);</pre>
+
+<p>&#8230; we&#8217;ll get a result set which is the intersection of the 
parsed query and the category query.</p>
+
+<h3><a class='u'
+name="Using_TermQuery_with_full_text_fields"
+>Using TermQuery with full text fields</a></h3>
+
+<p>When querying full text fields,
+the easiest way is to create query objects using QueryParser.
+But sometimes you want to create TermQuery for a single term in a FullTextType 
field directly.
+In this case,
+we have to run the search term through the field&#8217;s analyzer to make sure 
it gets normalized in the same way as the field&#8217;s content.</p>
+
+<pre>sub make_term_query {
+    my ($field, $term) = @_;
+
+    my $token;
+    my $type = $schema-&#62;fetch_type($field);
+
+    if ( $type-&#62;isa(&#39;Lucy::Plan::FullTextType&#39;) ) {
+        # Run the term through the full text analysis chain.
+        my $analyzer = $type-&#62;get_analyzer;
+        my $tokens   = $analyzer-&#62;split($term);
+
+        if ( @$tokens != 1 ) {
+            # If the term expands to more than one token, or no
+            # tokens at all, it will never match a token in the
+            # full text field.
+            return Lucy::Search::NoMatchQuery-&#62;new;
+        }
+
+        $token = $tokens-&#62;[0];
+    }
+    else {
+        # Exact match for other types.
+        $token = $term;
+    }
+
+    return Lucy::Search::TermQuery-&#62;new(
+        field =&#62; $field,
+        term  =&#62; $token,
+    );
+}</pre>
+
+<h3><a class='u'
+name="Congratulations!"
+>Congratulations!</a></h3>
+
+<p>You&#8217;ve made it to the end of the tutorial.</p>
+
+<h3><a class='u'
+name="See_Also"
+>See Also</a></h3>
+
+<p>For additional thematic documentation,
+see the Apache Lucy <a href="../../../Lucy/Docs/Cookbook.html" 
class="podlinkpod"
+>Cookbook</a>.</p>
+
+<p>ANDQuery has a companion class,
+<a href="../../../Lucy/Search/ORQuery.html" class="podlinkpod"
+>ORQuery</a>,
+and a close relative,
+<a href="../../../Lucy/Search/RequiredOptionalQuery.html" class="podlinkpod"
+>RequiredOptionalQuery</a>.</p>
+
+</div>

Added: 
lucy/site/trunk/content/docs/0.5.0/perl/Lucy/Docs/Tutorial/SimpleTutorial.mdtext
URL: 
http://svn.apache.org/viewvc/lucy/site/trunk/content/docs/0.5.0/perl/Lucy/Docs/Tutorial/SimpleTutorial.mdtext?rev=1762636&view=auto
==============================================================================
--- 
lucy/site/trunk/content/docs/0.5.0/perl/Lucy/Docs/Tutorial/SimpleTutorial.mdtext
 (added)
+++ 
lucy/site/trunk/content/docs/0.5.0/perl/Lucy/Docs/Tutorial/SimpleTutorial.mdtext
 Wed Sep 28 12:06:24 2016
@@ -0,0 +1,303 @@
+Title: Lucy::Docs::Tutorial::SimpleTutorial â Apache Lucy Documentation
+
+<div>
+<a name='___top' class='dummyTopAnchor' ></a>
+
+<h2><a class='u'
+name="NAME"
+>NAME</a></h2>
+
+<p>Lucy::Docs::Tutorial::SimpleTutorial - Bare-bones search app.</p>
+
+<h2><a class='u'
+name="DESCRIPTION"
+>DESCRIPTION</a></h2>
+
+<h3><a class='u'
+name="Setup"
+>Setup</a></h3>
+
+<p>Copy the text presentation of the US Constitution from the 
<code>sample</code> directory of the Apache Lucy distribution to the base level 
of your web server&#8217;s <code>htdocs</code> directory.</p>
+
+<pre>$ cp -R sample/us_constitution /usr/local/apache2/htdocs/</pre>
+
+<h3><a class='u'
+name="Indexing:_indexer.pl"
+>Indexing: indexer.pl</a></h3>
+
+<p>Our first task will be to create an application called 
<code>indexer.pl</code> which builds a searchable &#8220;inverted index&#8221; 
from a collection of documents.</p>
+
+<p>After we specify some configuration variables and load all necessary 
modules&#8230;</p>
+
+<pre>#!/usr/local/bin/perl
+use strict;
+use warnings;
+
+# (Change configuration variables as needed.)
+my $path_to_index = &#39;/path/to/index&#39;;
+my $uscon_source  = &#39;/usr/local/apache2/htdocs/us_constitution&#39;;
+
+use Lucy::Simple;
+use File::Spec::Functions qw( catfile );</pre>
+
+<p>&#8230; we&#8217;ll start by creating a <a href="../../../Lucy/Simple.html" 
class="podlinkpod"
+>Lucy::Simple</a> object,
+telling it where we&#8217;d like the index to be located and the language of 
the source material.</p>
+
+<pre>my $lucy = Lucy::Simple-&#62;new(
+    path     =&#62; $path_to_index,
+    language =&#62; &#39;en&#39;,
+);</pre>
+
+<p>Next,
+we&#8217;ll add a subroutine which parses our sample documents.</p>
+
+<pre># Parse a file from our US Constitution collection and return a hashref 
with
+# the fields title, body, and url.
+sub parse_file {
+    my $filename = shift;
+    my $filepath = catfile( $uscon_source, $filename );
+    open( my $fh, &#39;&#60;&#39;, $filepath ) or die &#34;Can&#39;t open 
&#39;$filepath&#39;: $!&#34;;
+    my $text = do { local $/; &#60;$fh&#62; };    # slurp file content
+    $text =~ /\A(.+?)^\s+(.*)/ms
+        or die &#34;Can&#39;t extract title/bodytext from 
&#39;$filepath&#39;&#34;;
+    my $title    = $1;
+    my $bodytext = $2;
+    return {
+        title    =&#62; $title,
+        content  =&#62; $bodytext,
+        url      =&#62; &#34;/us_constitution/$filename&#34;,
+    };
+}</pre>
+
+<p>Add some elementary directory reading code&#8230;</p>
+
+<pre># Collect names of source files.
+opendir( my $dh, $uscon_source )
+    or die &#34;Couldn&#39;t opendir &#39;$uscon_source&#39;: $!&#34;;
+my @filenames = grep { $_ =~ /\.txt/ } readdir $dh;</pre>
+
+<p>&#8230; and now we&#8217;re ready for the meat of indexer.pl &#8211; which 
occupies exactly one line of code.</p>
+
+<pre>foreach my $filename (@filenames) {
+    my $doc = parse_file($filename);
+    $lucy-&#62;add_doc($doc);  # ta-da!
+}</pre>
+
+<h3><a class='u'
+name="Search:_search.cgi"
+>Search: search.cgi</a></h3>
+
+<p>As with our indexing app,
+the bulk of the code in our search script won&#8217;t be Lucy-specific.</p>
+
+<p>The beginning is dedicated to CGI processing and configuration.</p>
+
+<pre>#!/usr/local/bin/perl -T
+use strict;
+use warnings;
+
+# (Change configuration variables as needed.)
+my $path_to_index = &#39;/path/to/index&#39;;
+
+use CGI;
+use List::Util qw( max min );
+use POSIX qw( ceil );
+use Encode qw( decode );
+use Lucy::Simple;
+
+my $cgi       = CGI-&#62;new;
+my $q         = decode( &#34;UTF-8&#34;, $cgi-&#62;param(&#39;q&#39;) || 
&#39;&#39; );
+my $offset    = decode( &#34;UTF-8&#34;, $cgi-&#62;param(&#39;offset&#39;) || 
0 );
+my $page_size = 10;</pre>
+
+<p>Once that&#8217;s out of the way,
+we create our Lucy::Simple object and feed it a query string.</p>
+
+<pre>my $lucy = Lucy::Simple-&#62;new(
+    path     =&#62; $path_to_index,
+    language =&#62; &#39;en&#39;,
+);
+my $hit_count = $lucy-&#62;search(
+    query      =&#62; $q,
+    offset     =&#62; $offset,
+    num_wanted =&#62; $page_size,
+);</pre>
+
+<p>The value returned by <a href="../../../Lucy/Simple.html#search" 
class="podlinkpod"
+>search()</a> is the total number of documents in the collection which matched 
the query.
+We&#8217;ll show this hit count to the user,
+and also use it in conjunction with the parameters <code>offset</code> and 
<code>num_wanted</code> to break up results into &#8220;pages&#8221; of 
manageable size.</p>
+
+<p>Calling <a href="../../../Lucy/Simple.html#search" class="podlinkpod"
+>search()</a> on our Simple object turns it into an iterator.
+Invoking <a href="../../../Lucy/Simple.html#next" class="podlinkpod"
+>next()</a> now returns hits one at a time as <a 
href="../../../Lucy/Document/HitDoc.html" class="podlinkpod"
+>HitDoc</a> objects,
+starting with the most relevant.</p>
+
+<pre># Create result list.
+my $report = &#39;&#39;;
+while ( my $hit = $lucy-&#62;next ) {
+    my $score = sprintf( &#34;%0.3f&#34;, $hit-&#62;get_score );
+    $report .= qq|
+        &#60;p&#62;
+          &#60;a 
href=&#34;$hit-&#62;{url}&#34;&#62;&#60;strong&#62;$hit-&#62;{title}&#60;/strong&#62;&#60;/a&#62;
+          &#60;em&#62;$score&#60;/em&#62;
+          &#60;br&#62;
+          &#60;span 
class=&#34;excerptURL&#34;&#62;$hit-&#62;{url}&#60;/span&#62;
+        &#60;/p&#62;
+        |;
+}</pre>
+
+<p>The rest of the script is just text wrangling.</p>
+
+<pre>#---------------------------------------------------------------#
+# No tutorial material below this point - just html generation. #
+#---------------------------------------------------------------#
+
+# Generate paging links and hit count, print and exit.
+my $paging_links = generate_paging_info( $q, $hit_count );
+blast_out_content( $q, $report, $paging_links );
+
+# Create html fragment with links for paging through results n-at-a-time.
+sub generate_paging_info {
+    my ( $query_string, $total_hits ) = @_;
+    my $escaped_q = CGI::escapeHTML($query_string);
+    my $paging_info;
+    if ( !length $query_string ) {
+        # No query?  No display.
+        $paging_info = &#39;&#39;;
+    }
+    elsif ( $total_hits == 0 ) {
+        # Alert the user that their search failed.
+        $paging_info
+            = qq|&#60;p&#62;No matches for 
&#60;strong&#62;$escaped_q&#60;/strong&#62;&#60;/p&#62;|;
+    }
+    else {
+        # Calculate the nums for the first and last hit to display.
+        my $last_result = min( ( $offset + $page_size ), $total_hits );
+        my $first_result = min( ( $offset + 1 ), $last_result );
+
+        # Display the result nums, start paging info.
+        $paging_info = qq|
+            &#60;p&#62;
+                Results 
&#60;strong&#62;$first_result-$last_result&#60;/strong&#62; 
+                of &#60;strong&#62;$total_hits&#60;/strong&#62; 
+                for &#60;strong&#62;$escaped_q&#60;/strong&#62;.
+            &#60;/p&#62;
+            &#60;p&#62;
+                Results Page:
+            |;
+
+        # Calculate first and last hits pages to display / link to.
+        my $current_page = int( $first_result / $page_size ) + 1;
+        my $last_page    = ceil( $total_hits / $page_size );
+        my $first_page   = max( 1, ( $current_page - 9 ) );
+        $last_page = min( $last_page, ( $current_page + 10 ) );
+
+        # Create a url for use in paging links.
+        my $href = $cgi-&#62;url( -relative =&#62; 1 );
+        $href .= &#34;?q=&#34; . CGI::escape($query_string);
+        $href .= &#34;;offset=&#34; . CGI::escape($offset);
+
+        # Generate the &#34;Prev&#34; link.
+        if ( $current_page &#62; 1 ) {
+            my $new_offset = ( $current_page - 2 ) * $page_size;
+            $href =~ s/(?&#60;=offset=)\d+/$new_offset/;
+            $paging_info .= qq|&#60;a href=&#34;$href&#34;&#62;&#38;lt;= 
Prev&#60;/a&#62;\n|;
+        }
+
+        # Generate paging links.
+        for my $page_num ( $first_page .. $last_page ) {
+            if ( $page_num == $current_page ) {
+                $paging_info .= qq|$page_num \n|;
+            }
+            else {
+                my $new_offset = ( $page_num - 1 ) * $page_size;
+                $href =~ s/(?&#60;=offset=)\d+/$new_offset/;
+                $paging_info .= qq|&#60;a 
href=&#34;$href&#34;&#62;$page_num&#60;/a&#62;\n|;
+            }
+        }
+
+        # Generate the &#34;Next&#34; link.
+        if ( $current_page != $last_page ) {
+            my $new_offset = $current_page * $page_size;
+            $href =~ s/(?&#60;=offset=)\d+/$new_offset/;
+            $paging_info .= qq|&#60;a href=&#34;$href&#34;&#62;Next 
=&#38;gt;&#60;/a&#62;\n|;
+        }
+
+        # Close tag.
+        $paging_info .= &#34;&#60;/p&#62;\n&#34;;
+    }
+
+    return $paging_info;
+}
+
+# Print content to output.
+sub blast_out_content {
+    my ( $query_string, $hit_list, $paging_info ) = @_;
+    my $escaped_q = CGI::escapeHTML($query_string);
+    binmode( STDOUT, &#34;:encoding(UTF-8)&#34; );
+    print qq|Content-type: text/html; charset=UTF-8\n\n|;
+    print qq|
+&#60;!DOCTYPE html PUBLIC &#34;-//W3C//DTD HTML 4.01 Transitional//EN&#34;
+    &#34;http://www.w3.org/TR/html4/loose.dtd&#34;&#62;
+&#60;html&#62;
+&#60;head&#62;
+  &#60;meta http-equiv=&#34;Content-type&#34; 
+    content=&#34;text/html;charset=UTF-8&#34;&#62;
+  &#60;link rel=&#34;stylesheet&#34; type=&#34;text/css&#34; 
+    href=&#34;/us_constitution/uscon.css&#34;&#62;
+  &#60;title&#62;Lucy: $escaped_q&#60;/title&#62;
+&#60;/head&#62;
+
+&#60;body&#62;
+
+  &#60;div id=&#34;navigation&#34;&#62;
+    &#60;form id=&#34;usconSearch&#34; action=&#34;&#34;&#62;
+      &#60;strong&#62;
+        Search the 
+        &#60;a href=&#34;/us_constitution/index.html&#34;&#62;US 
Constitution&#60;/a&#62;:
+      &#60;/strong&#62;
+      &#60;input type=&#34;text&#34; name=&#34;q&#34; id=&#34;q&#34; 
value=&#34;$escaped_q&#34;&#62;
+      &#60;input type=&#34;submit&#34; value=&#34;=&#38;gt;&#34;&#62;
+    &#60;/form&#62;
+  &#60;/div&#62;&#60;!--navigation--&#62;
+
+  &#60;div id=&#34;bodytext&#34;&#62;
+
+  $hit_list
+
+  $paging_info
+
+    &#60;p style=&#34;font-size: smaller; color: #666&#34;&#62;
+      &#60;em&#62;
+        Powered by &#60;a href=&#34;http://lucy.apache.org/&#34;
+        &#62;Apache 
Lucy&#60;small&#62;&#60;sup&#62;TM&#60;/sup&#62;&#60;/small&#62;&#60;/a&#62;
+      &#60;/em&#62;
+    &#60;/p&#62;
+  &#60;/div&#62;&#60;!--bodytext--&#62;
+
+&#60;/body&#62;
+
+&#60;/html&#62;
+|;
+}</pre>
+
+<h3><a class='u'
+name="OK(8230)_now_what?"
+>OK&#8230; now what?</a></h3>
+
+<p>Lucy::Simple is perfectly adequate for some tasks,
+but it&#8217;s not very flexible.
+Many people find that it doesn&#8217;t do at least one or two things they 
can&#8217;t live without.</p>
+
+<p>In our next tutorial chapter,
+<a href="../../../Lucy/Docs/Tutorial/BeyondSimpleTutorial.html" 
class="podlinkpod"
+>BeyondSimpleTutorial</a>,
+we&#8217;ll rewrite our indexing and search scripts using the classes that 
Lucy::Simple hides from view,
+opening up the possibilities for expansion; then,
+we&#8217;ll spend the rest of the tutorial chapters exploring these 
possibilities.</p>
+
+</div>

Added: lucy/site/trunk/content/docs/0.5.0/perl/Lucy/Document/Doc.mdtext
URL: 
http://svn.apache.org/viewvc/lucy/site/trunk/content/docs/0.5.0/perl/Lucy/Document/Doc.mdtext?rev=1762636&view=auto
==============================================================================
--- lucy/site/trunk/content/docs/0.5.0/perl/Lucy/Document/Doc.mdtext (added)
+++ lucy/site/trunk/content/docs/0.5.0/perl/Lucy/Document/Doc.mdtext Wed Sep 28 
12:06:24 2016
@@ -0,0 +1,128 @@
+Title: Lucy::Document::Doc â Apache Lucy Documentation
+
+<div>
+<a name='___top' class='dummyTopAnchor' ></a>
+
+<h2><a class='u'
+name="NAME"
+>NAME</a></h2>
+
+<p>Lucy::Document::Doc - A document.</p>
+
+<h2><a class='u'
+name="SYNOPSIS"
+>SYNOPSIS</a></h2>
+
+<pre>my $doc = Lucy::Document::Doc-&#62;new(
+    fields =&#62; { foo =&#62; &#39;foo foo&#39;, bar =&#62; &#39;bar bar&#39; 
},
+);
+$indexer-&#62;add_doc($doc);</pre>
+
+<p>Doc objects allow access to field values via hashref overloading:</p>
+
+<pre>$doc-&#62;{foo} = &#39;new value for field &#34;foo&#34;&#39;;
+print &#34;foo: $doc-&#62;{foo}\n&#34;;</pre>
+
+<h2><a class='u'
+name="DESCRIPTION"
+>DESCRIPTION</a></h2>
+
+<p>A Doc object is akin to a row in a database,
+in that it is made up of one or more fields,
+each of which has a value.</p>
+
+<h2><a class='u'
+name="CONSTRUCTORS"
+>CONSTRUCTORS</a></h2>
+
+<h3><a class='u'
+name="new"
+>new</a></h3>
+
+<pre>my $doc = Lucy::Document::Doc-&#62;new(
+    fields =&#62; { foo =&#62; &#39;foo foo&#39;, bar =&#62; &#39;bar bar&#39; 
},
+);</pre>
+
+<p>Create a new Document.</p>
+
+<ul>
+<li><b>fields</b> - Field-value pairs.</li>
+
+<li><b>doc_id</b> - Internal Lucy document id.
+Default of 0 (an invalid doc id).</li>
+</ul>
+
+<h2><a class='u'
+name="METHODS"
+>METHODS</a></h2>
+
+<h3><a class='u'
+name="set_doc_id"
+>set_doc_id</a></h3>
+
+<pre>$doc-&#62;set_doc_id($doc_id);</pre>
+
+<p>Set internal Lucy document id.</p>
+
+<h3><a class='u'
+name="get_doc_id"
+>get_doc_id</a></h3>
+
+<pre>my $int = $doc-&#62;get_doc_id();</pre>
+
+<p>Retrieve internal Lucy document id.</p>
+
+<h3><a class='u'
+name="store"
+>store</a></h3>
+
+<pre>$doc-&#62;store($field, $value);</pre>
+
+<p>Store a field value in the Doc.</p>
+
+<ul>
+<li><b>field</b> - The field name.</li>
+
+<li><b>value</b> - The value.</li>
+</ul>
+
+<h3><a class='u'
+name="get_fields"
+>get_fields</a></h3>
+
+<pre>my $hashref = $doc-&#62;get_fields();</pre>
+
+<p>Return the Doc&#39;s backing fields hash.</p>
+
+<h3><a class='u'
+name="get_size"
+>get_size</a></h3>
+
+<pre>my $int = $doc-&#62;get_size();</pre>
+
+<p>Return the number of fields in the Doc.</p>
+
+<h3><a class='u'
+name="extract"
+>extract</a></h3>
+
+<pre>my $obj = $doc-&#62;extract($field);</pre>
+
+<p>Retrieve the field&#8217;s value,
+or NULL if the field is not present.</p>
+
+<h3><a class='u'
+name="field_names"
+>field_names</a></h3>
+
+<pre>my $arrayref = $doc-&#62;field_names();</pre>
+
+<p>Return a list of names of all fields present.</p>
+
+<h2><a class='u'
+name="INHERITANCE"
+>INHERITANCE</a></h2>
+
+<p>Lucy::Document::Doc isa Clownfish::Obj.</p>
+
+</div>

Added: lucy/site/trunk/content/docs/0.5.0/perl/Lucy/Document/HitDoc.mdtext
URL: 
http://svn.apache.org/viewvc/lucy/site/trunk/content/docs/0.5.0/perl/Lucy/Document/HitDoc.mdtext?rev=1762636&view=auto
==============================================================================
--- lucy/site/trunk/content/docs/0.5.0/perl/Lucy/Document/HitDoc.mdtext (added)
+++ lucy/site/trunk/content/docs/0.5.0/perl/Lucy/Document/HitDoc.mdtext Wed Sep 
28 12:06:24 2016
@@ -0,0 +1,55 @@
+Title: Lucy::Document::HitDoc â Apache Lucy Documentation
+
+<div>
+<a name='___top' class='dummyTopAnchor' ></a>
+
+<h2><a class='u'
+name="NAME"
+>NAME</a></h2>
+
+<p>Lucy::Document::HitDoc - A document read from an index.</p>
+
+<h2><a class='u'
+name="SYNOPSIS"
+>SYNOPSIS</a></h2>
+
+<pre>while ( my $hit_doc = $hits-&#62;next ) {
+    print &#34;$hit_doc-&#62;{title}\n&#34;;
+    print $hit_doc-&#62;get_score . &#34;\n&#34;;
+    ...
+}</pre>
+
+<h2><a class='u'
+name="DESCRIPTION"
+>DESCRIPTION</a></h2>
+
+<p>HitDoc is the search-time relative of the index-time class Doc; it is 
augmented by a numeric score attribute that Doc doesn&#8217;t have.</p>
+
+<h2><a class='u'
+name="METHODS"
+>METHODS</a></h2>
+
+<h3><a class='u'
+name="set_score"
+>set_score</a></h3>
+
+<pre>$hit_doc-&#62;set_score($score);</pre>
+
+<p>Set score attribute.</p>
+
+<h3><a class='u'
+name="get_score"
+>get_score</a></h3>
+
+<pre>my $float = $hit_doc-&#62;get_score();</pre>
+
+<p>Get score attribute.</p>
+
+<h2><a class='u'
+name="INHERITANCE"
+>INHERITANCE</a></h2>
+
+<p>Lucy::Document::HitDoc isa <a href="../../Lucy/Document/Doc.html" 
class="podlinkpod"
+>Lucy::Document::Doc</a> isa Clownfish::Obj.</p>
+
+</div>

Added: lucy/site/trunk/content/docs/0.5.0/perl/Lucy/Highlight/Highlighter.mdtext
URL: 
http://svn.apache.org/viewvc/lucy/site/trunk/content/docs/0.5.0/perl/Lucy/Highlight/Highlighter.mdtext?rev=1762636&view=auto
==============================================================================
--- lucy/site/trunk/content/docs/0.5.0/perl/Lucy/Highlight/Highlighter.mdtext 
(added)
+++ lucy/site/trunk/content/docs/0.5.0/perl/Lucy/Highlight/Highlighter.mdtext 
Wed Sep 28 12:06:24 2016
@@ -0,0 +1,184 @@
+Title: Lucy::Highlight::Highlighter â Apache Lucy Documentation
+
+<div>
+<a name='___top' class='dummyTopAnchor' ></a>
+
+<h2><a class='u'
+name="NAME"
+>NAME</a></h2>
+
+<p>Lucy::Highlight::Highlighter - Create and highlight excerpts.</p>
+
+<h2><a class='u'
+name="SYNOPSIS"
+>SYNOPSIS</a></h2>
+
+<pre>my $highlighter = Lucy::Highlight::Highlighter-&#62;new(
+    searcher =&#62; $searcher,
+    query    =&#62; $query,
+    field    =&#62; &#39;body&#39;
+);
+my $hits = $searcher-&#62;hits( query =&#62; $query );
+while ( my $hit = $hits-&#62;next ) {
+    my $excerpt = $highlighter-&#62;create_excerpt($hit);
+    ...
+}</pre>
+
+<h2><a class='u'
+name="DESCRIPTION"
+>DESCRIPTION</a></h2>
+
+<p>The Highlighter can be used to select relevant snippets from a document,
+and to surround search terms with highlighting tags.
+It handles both stems and phrases correctly and efficiently,
+using special-purpose data generated at index-time.</p>
+
+<h2><a class='u'
+name="CONSTRUCTORS"
+>CONSTRUCTORS</a></h2>
+
+<h3><a class='u'
+name="new"
+>new</a></h3>
+
+<pre>my $highlighter = Lucy::Highlight::Highlighter-&#62;new(
+    searcher       =&#62; $searcher,    # required
+    query          =&#62; $query,       # required
+    field          =&#62; &#39;content&#39;,    # required
+    excerpt_length =&#62; 150,          # default: 200
+);</pre>
+
+<p>Create a new Highlighter.</p>
+
+<ul>
+<li><b>searcher</b> - An object which inherits from <a 
href="../../Lucy/Search/Searcher.html" class="podlinkpod"
+>Searcher</a>,
+such as an <a href="../../Lucy/Search/IndexSearcher.html" class="podlinkpod"
+>IndexSearcher</a>.</li>
+
+<li><b>query</b> - Query object or a query string.</li>
+
+<li><b>field</b> - The name of the field from which to draw the excerpt.
+The field must marked as be <code>highlightable</code> (see <a 
href="../../Lucy/Plan/FieldType.html" class="podlinkpod"
+>FieldType</a>).</li>
+
+<li><b>excerpt_length</b> - Maximum length of the excerpt,
+in characters.</li>
+</ul>
+
+<h2><a class='u'
+name="METHODS"
+>METHODS</a></h2>
+
+<h3><a class='u'
+name="create_excerpt"
+>create_excerpt</a></h3>
+
+<pre>my $string = $highlighter-&#62;create_excerpt($hit_doc);</pre>
+
+<p>Take a HitDoc object and return a highlighted excerpt as a string if the 
HitDoc has a value for the specified <code>field</code>.</p>
+
+<h3><a class='u'
+name="encode"
+>encode</a></h3>
+
+<pre>my $string = $highlighter-&#62;encode($text);</pre>
+
+<p>Encode text with HTML entities.
+This method is called internally by <a href="#create_excerpt" 
class="podlinkpod"
+>create_excerpt()</a> for each text fragment when assembling an excerpt.
+A subclass can override this if the text should be encoded differently or not 
at all.</p>
+
+<h3><a class='u'
+name="highlight"
+>highlight</a></h3>
+
+<pre>my $string = $highlighter-&#62;highlight($text);</pre>
+
+<p>Highlight a small section of text.
+By default,
+prepends pre-tag and appends post-tag.
+This method is called internally by <a href="#create_excerpt" 
class="podlinkpod"
+>create_excerpt()</a> when assembling an excerpt.</p>
+
+<h3><a class='u'
+name="set_pre_tag"
+>set_pre_tag</a></h3>
+
+<pre>$highlighter-&#62;set_pre_tag($pre_tag);</pre>
+
+<p>Setter.
+The default value is &#8220;&#60;strong&#62;&#8221;.</p>
+
+<h3><a class='u'
+name="set_post_tag"
+>set_post_tag</a></h3>
+
+<pre>$highlighter-&#62;set_post_tag($post_tag);</pre>
+
+<p>Setter.
+The default value is &#8220;&#60;/strong&#62;&#8221;.</p>
+
+<h3><a class='u'
+name="get_pre_tag"
+>get_pre_tag</a></h3>
+
+<pre>my $string = $highlighter-&#62;get_pre_tag();</pre>
+
+<p>Accessor.</p>
+
+<h3><a class='u'
+name="get_post_tag"
+>get_post_tag</a></h3>
+
+<pre>my $string = $highlighter-&#62;get_post_tag();</pre>
+
+<p>Accessor.</p>
+
+<h3><a class='u'
+name="get_field"
+>get_field</a></h3>
+
+<pre>my $string = $highlighter-&#62;get_field();</pre>
+
+<p>Accessor.</p>
+
+<h3><a class='u'
+name="get_excerpt_length"
+>get_excerpt_length</a></h3>
+
+<pre>my $int = $highlighter-&#62;get_excerpt_length();</pre>
+
+<p>Accessor.</p>
+
+<h3><a class='u'
+name="get_searcher"
+>get_searcher</a></h3>
+
+<pre>my $searcher = $highlighter-&#62;get_searcher();</pre>
+
+<p>Accessor.</p>
+
+<h3><a class='u'
+name="get_query"
+>get_query</a></h3>
+
+<pre>my $query = $highlighter-&#62;get_query();</pre>
+
+<p>Accessor.</p>
+
+<h3><a class='u'
+name="get_compiler"
+>get_compiler</a></h3>
+
+<pre>my $compiler = $highlighter-&#62;get_compiler();</pre>
+
+<p>Accessor for the Lucy::Search::Compiler object derived from 
<code>query</code> and <code>searcher</code>.</p>
+
+<h2><a class='u'
+name="INHERITANCE"
+>INHERITANCE</a></h2>
+
+<p>Lucy::Highlight::Highlighter isa Clownfish::Obj.</p>
+
+</div>

svn commit: r1762636 [9/12] - in /lucy/site/trunk/content/docs: ./ 0.5.0/ 0.5.0/c/ 0.5.0/c/Clownfish/ 0.5.0/c/Clownfish/Docs/ 0.5.0/c/Lucy/ 0.5.0/c/Lucy/Analysis/ 0.5.0/c/Lucy/Docs/ 0.5.0/c/Lucy/Docs/Cookbook/ 0.5.0/c/Lucy/Docs/Tutorial/ 0.5.0/c/Lucy/D...

Reply via email to