svn commit: r998475 [6/26] - in /websites/staging/lucy/trunk/content: ./ docs/ docs/0.5.0/ docs/0.5.0/c/ docs/0.5.0/c/Clownfish/ docs/0.5.0/c/Clownfish/Docs/ docs/0.5.0/c/Lucy/ docs/0.5.0/c/Lucy/Analysis/ docs/0.5.0/c/Lucy/Docs/ docs/0.5.0/c/Lucy/Docs/...

buildbot Wed, 28 Sep 2016 05:09:22 -0700

Added: 
websites/staging/lucy/trunk/content/docs/0.5.0/c/Lucy/Docs/FileFormat.html
==============================================================================
--- websites/staging/lucy/trunk/content/docs/0.5.0/c/Lucy/Docs/FileFormat.html 
(added)
+++ websites/staging/lucy/trunk/content/docs/0.5.0/c/Lucy/Docs/FileFormat.html 
Wed Sep 28 12:07:48 2016
@@ -0,0 +1,260 @@
+<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
+<html lang="en">
+  <head>
+    <meta http-equiv="Content-Type" content="text/html;charset=UTF-8">
+    <title>Lucy::Docs::FileFormat</title>
+    <link rel="stylesheet" type="text/css" media="screen" href="/css/lucy.css">
+  </head>
+
+  <body>
+
+    <div id="lucy-rigid_wrapper">
+
+      <div id="lucy-top" class="container_16 lucy-white_box_3d">
+
+        <div id="lucy-logo_box" class="grid_8">
+          <a href="/"><img src="/images/lucy_logo_150x100.png" alt="Apache 
Lucyâ¢"></a>
+        </div> <!-- lucy-logo_box -->
+
+        <div #id="lucy-top_nav_box" class="grid_8">
+          <div id="lucy-top_nav_bar" class="container_8">
+            <ul>
+              <li><a href="http://www.apache.org/"; title="Apache Software 
Foundation">Apache Software Foundation</a></li>
+              <li><a href="http://www.apache.org/licenses/"; 
title="License">License</a></li>
+              <li><a href="http://www.apache.org/foundation/sponsorship.html"; 
title="Sponsorship">Sponsorship</a></li>
+              <li><a href="http://www.apache.org/foundation/thanks.html"; 
title="Thanks">Thanks</a></li>
+              <li><a href="http://www.apache.org/security/ " 
title="Security">Security</a></li>
+            </ul>
+          </div> <!-- lucy-top_nav_bar -->
+          <p><a href="http://www.apache.org/";>Apache</a>&nbsp;&raquo&nbsp;<a 
href="/">Lucy</a>&nbsp;&raquo&nbsp;<a 
href="/docs/">Docs</a>&nbsp;&raquo&nbsp;<a 
href="/docs/0.5.0/">0.5.0</a>&nbsp;&raquo&nbsp;<a 
href="/docs/0.5.0/c/">C</a>&nbsp;&raquo&nbsp;<a 
href="/docs/0.5.0/c/Lucy/">Lucy</a>&nbsp;&raquo&nbsp;<a 
href="/docs/0.5.0/c/Lucy/Docs/">Docs</a></p>
+          <form name="lucy-top_search_box" id="lucy-top_search_box" 
action="http://www.google.com/search"; method="get">
+            <input value="*.apache.org" name="sitesearch" type="hidden"/>
+            <input type="text" name="q" id="query" style="width:85%">
+            <input type="submit" id="submit" value="Search">
+          </form>
+        </div> <!-- lucy-top_nav_box -->
+
+        <div class="clear"></div>
+
+      </div> <!-- lucy-top -->
+
+      <div id="lucy-main_content" class="container_16 lucy-white_box_3d">
+
+        <div class="grid_4" id="lucy-left_nav_box">
+          <h6>About</h6>
+            <ul>
+              <li><a href="/">Welcome</a></li>
+              <li><a href="/clownfish.html">Clownfish</a></li>
+              <li><a href="/faq.html">FAQ</a></li>
+              <li><a href="/people.html">People</a></li>
+            </ul>
+          <h6>Resources</h6>
+            <ul>
+              <li><a href="/download.html">Download</a></li>
+              <li><a href="/mailing_lists.html">Mailing Lists</a></li>
+              <li><a href="/docs/">Documentation</a></li>
+              <li><a href="http://wiki.apache.org/lucy/";>Wiki</a></li>
+              <li><a href="https://issues.apache.org/jira/browse/LUCY";>Issue 
Tracker</a></li>
+              <li><a href="/version_control.html">Version Control</a></li>
+            </ul>
+          <h6>Related Projects</h6>
+            <ul>
+              <li><a href="http://lucene.apache.org/core/";>Lucene</a></li>
+              <li><a href="http://dezi.org/";>Dezi</a></li>
+              <li><a href="http://lucene.apache.org/solr/";>Solr</a></li>
+              <li><a href="http://lucenenet.apache.org/";>Lucene.NET</a></li>
+              <li><a 
href="http://lucene.apache.org/pylucene/";>PyLucene</a></li>
+            </ul>
+        </div> <!-- lucy-left_nav_box -->
+
+        <div id="lucy-main_content_box" class="grid_9">
+          <div class="c-api">
+<h2>Overview of index file format</h2>
+<p>It is not necessary to understand the current implementation details of the
+index file format in order to use Apache Lucy effectively, but it may be
+helpful if you are interested in tweaking for high performance, exotic usage,
+or debugging and development.</p>
+<p>On a file system, an index is a directory.  The files inside have a
+hierarchical relationship: an index is made up of âsegmentsâ, each of 
which is
+an independent inverted index with its own subdirectory; each segment is made
+up of several component parts.</p>
+<pre><code>[index]--|
+         |--snapshot_XXX.json
+         |--schema_XXX.json
+         |--write.lock
+         |
+         |--seg_1--|
+         |         |--segmeta.json
+         |         |--cfmeta.json
+         |         |--cf.dat-------|
+         |                         |--[lexicon]
+         |                         |--[postings]
+         |                         |--[documents]
+         |                         |--[highlight]
+         |                         |--[deletions]
+         |
+         |--seg_2--|
+         |         |--segmeta.json
+         |         |--cfmeta.json
+         |         |--cf.dat-------|
+         |                         |--[lexicon]
+         |                         |--[postings]
+         |                         |--[documents]
+         |                         |--[highlight]
+         |                         |--[deletions]
+         |
+         |--[...]--| 
+</code></pre>
+<h3>Write-once philosophy</h3>
+<p>All segment directory names consist of the string âseg_â followed by a 
number
+in base 36: seg_1, seg_5m, seg_p9s2 and so on, with higher numbers indicating
+more recent segments.  Once a segment is finished and committed, its name is
+never re-used and its files are never modified.</p>
+<p>Old segments become obsolete and can be removed when their data has been
+consolidated into new segments during the process of segment merging and
+optimization.  A fully-optimized index has only one segment.</p>
+<h3>Top-level entries</h3>
+<p>There are a handful of âtop-levelâ files and directories which belong 
to the
+entire index rather than to a particular segment.</p>
+<h4>snapshot_XXX.json</h4>
+<p>A âsnapshotâ file, e.g. <code>snapshot_m7p.json</code>, is list of 
index files and
+directories.  Because index files, once written, are never modified, the list
+of entries in a snapshot defines a point-in-time view of the data in an 
index.</p>
+<p>Like segment directories, snapshot files also utilize the
+unique-base-36-number naming convention; the higher the number, the more
+recent the file.  The appearance of a new snapshot file within the index
+directory constitutes an index update.  While a new segment is being written
+new files may be added to the index directory, but until a new snapshot file
+gets written, a Searcher opening the index for reading wonât know about 
them.</p>
+<h4>schema_XXX.json</h4>
+<p>The schema file is a Schema object describing the indexâs format, 
serialized
+as JSON.  It, too, is versioned, and a given snapshot file will reference one
+and only one schema file.</p>
+<h4>locks</h4>
+<p>By default, only one indexing process may safely modify the index at any 
given
+time.  Processes reserve an index by laying claim to the 
<code>write.lock</code> file
+within the <code>locks/</code> directory.  A smattering of other lock files 
may be used
+from time to time, as well.</p>
+<h3>A segmentâs component parts</h3>
+<p>By default, each segment has up to five logical components: lexicon, 
postings,
+document storage, highlight data, and deletions.  Binary data from these
+components gets stored in virtual files within the âcf.datâ compound file;
+metadata is stored in a shared âsegmeta.jsonâ file.</p>
+<h4>segmeta.json</h4>
+<p>The segmeta.json file is a central repository for segment metadata.  In
+addition to information such as document counts and field numbers, it also
+warehouses arbitrary metadata on behalf of individual index components.</p>
+<h4>Lexicon</h4>
+<p>Each indexed field gets its own lexicon in each segment.  The exact files
+involved depend on the fieldâs type, but generally speaking there will be two
+parts.  First, thereâs a primary <code>lexicon-XXX.dat</code> file which 
houses a
+complete term list associating terms with corpus frequency statistics,
+postings file locations, etc.  Second, one or more âlexicon indexâ files 
may
+be present which contain periodic samples from the primary lexicon file to
+facilitate fast lookups.</p>
+<h4>Postings</h4>
+<p>âPostingâ is a technical term from the field of
+<a href="../../Lucy/Docs/IRTheory.html">information retrieval</a>, defined as 
a single
+instance of a one term indexing one document.  If you are looking at the index
+in the back of a book, and you see that âfreedomâ is referenced on pages 8,
+86, and 240, that would be three postings, which taken together form a
+âposting listâ.  The same terminology applies to an index in electronic 
form.</p>
+<p>Each segment has one postings file per indexed field.  When a search is
+performed for a single term, first that term is looked up in the lexicon.  If
+the term exists in the segment, the record in the lexicon will contain
+information about which postings file to look at and where to look.</p>
+<p>The first thing any posting record tells you is a document id.  By iterating
+over all the postings associated with a term, you can find all the documents
+that match that term, a process which is analogous to looking up page numbers
+in a bookâs index.  However, each posting record typically contains other
+information in addition to document id, e.g. the positions at which the term
+occurs within the field.</p>
+<h4>Documents</h4>
+<p>The document storage section is a simple database, organized into two 
files:</p>
+<ul>
+<li>
+<p><strong>documents.dat</strong> - Serialized documents.</p>
+</li>
+<li>
+<p><strong>documents.ix</strong> - Document storage index, a solid array of 
64-bit integers
+where each integer location corresponds to a document id, and the value at
+that location points at a file position in the documents.dat file.</p>
+</li>
+</ul>
+<h4>Highlight data</h4>
+<p>The files which store data used for excerpting and highlighting are 
organized
+similarly to the files used to store documents.</p>
+<ul>
+<li>
+<p><strong>highlight.dat</strong> - Chunks of serialized highlight data, one 
per doc id.</p>
+</li>
+<li>
+<p><strong>highlight.ix</strong> - Highlight data index â as with the 
<code>documents.ix</code> file, a
+solid array of 64-bit file pointers.</p>
+</li>
+</ul>
+<h4>Deletions</h4>
+<p>When a document is âdeletedâ from a segment, it is not actually purged 
right
+away; it is merely marked as âdeletedâ via a deletions file.  Deletions 
files
+contains bit vectors with one bit for each document in the segment; if bit
+#254 is set then document 254 is deleted, and if that document turns up in a
+search it will be masked out.</p>
+<p>It is only when a segmentâs contents are rewritten to a new segment 
during the
+segment-merging process that deleted documents truly go away.</p>
+<h3>Compound Files</h3>
+<p>If you peer inside an index directory, you wonât actually find any files 
named
+âdocuments.datâ, âhighlight.ixâ, etc. unless there is an indexing 
process
+underway.  What you will find instead is one âcf.datâ and one 
âcfmeta.jsonâ
+file per segment.</p>
+<p>To minimize the need for file descriptors at search-time, all per-segment
+binary data files are concatenated together in âcf.datâ at the close of 
each
+indexing session.  Information about where each file begins and ends is stored
+in <code>cfmeta.json</code>.  When the segment is opened for reading, a single 
file
+descriptor per âcf.datâ file can be shared among several readers.</p>
+<h3>A Typical Search</h3>
+<p>Hereâs a simplified narrative, dramatizing how a search for âfreedomâ 
against
+a given segment plays out:</p>
+<ol>
+<li>
+<p>The searcher asks the relevant Lexicon Index, âDo you know anything about
+âfreedomâ?â  Lexicon Index replies, âCanât say for sure, but if the 
main
+Lexicon file does, âfreedomâ is probably somewhere around byte 
21008â.</p>
+</li>
+<li>
+<p>The main Lexicon tells the searcher âOne moment, let me scan our 
recordsâ¦
+Yes, we have 2 documents which contain âfreedomâ.  Youâll find them in
+seg_6/postings-4.dat starting at byte 66991.â</p>
+</li>
+<li>
+<p>The Postings file says âYep, we have âfreedomâ, all right!  Document 
id 40
+has 1 âfreedomâ, and document 44 has 8.  If you need to know more, like if 
any
+âfreedomâ is part of the phrase âfreedom of speechâ, ask me about 
positions!</p>
+</li>
+<li>
+<p>If the searcher is only looking for âfreedomâ in isolation, thatâs 
where it
+stops.  It now knows enough to assign the documents scores against 
âfreedomâ,
+with the 8-freedom document likely ranking higher than the single-freedom
+document.</p>
+</li>
+</ol>
+</div>
+
+        </div> <!-- lucy-main_content_box --> 
+        <div class="clear"></div>
+
+      </div> <!-- lucy-main_content -->
+
+      <div id="lucy-copyright" class="container_16">
+        <p>Copyright &#169; 2010-2015 The Apache Software Foundation, Licensed 
under the 
+           <a href="http://www.apache.org/licenses/LICENSE-2.0";>Apache 
License, Version 2.0</a>.
+           <br/>
+           Apache Lucy, Lucy, Apache, the Apache feather logo, and the Apache 
Lucy project logo are trademarks of The
+           Apache Software Foundation.  All other marks mentioned may be 
trademarks or registered trademarks of their
+           respective owners.
+        </p>
+      </div> <!-- lucy-copyright -->
+
+    </div> <!-- lucy-rigid_wrapper -->
+
+  </body>
+</html>


Added: 
websites/staging/lucy/trunk/content/docs/0.5.0/c/Lucy/Docs/FileLocking.html
==============================================================================
--- websites/staging/lucy/trunk/content/docs/0.5.0/c/Lucy/Docs/FileLocking.html 
(added)
+++ websites/staging/lucy/trunk/content/docs/0.5.0/c/Lucy/Docs/FileLocking.html 
Wed Sep 28 12:07:48 2016
@@ -0,0 +1,144 @@
+<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
+<html lang="en">
+  <head>
+    <meta http-equiv="Content-Type" content="text/html;charset=UTF-8">
+    <title>Lucy::Docs::FileLocking</title>
+    <link rel="stylesheet" type="text/css" media="screen" href="/css/lucy.css">
+  </head>
+
+  <body>
+
+    <div id="lucy-rigid_wrapper">
+
+      <div id="lucy-top" class="container_16 lucy-white_box_3d">
+
+        <div id="lucy-logo_box" class="grid_8">
+          <a href="/"><img src="/images/lucy_logo_150x100.png" alt="Apache 
Lucyâ¢"></a>
+        </div> <!-- lucy-logo_box -->
+
+        <div #id="lucy-top_nav_box" class="grid_8">
+          <div id="lucy-top_nav_bar" class="container_8">
+            <ul>
+              <li><a href="http://www.apache.org/"; title="Apache Software 
Foundation">Apache Software Foundation</a></li>
+              <li><a href="http://www.apache.org/licenses/"; 
title="License">License</a></li>
+              <li><a href="http://www.apache.org/foundation/sponsorship.html"; 
title="Sponsorship">Sponsorship</a></li>
+              <li><a href="http://www.apache.org/foundation/thanks.html"; 
title="Thanks">Thanks</a></li>
+              <li><a href="http://www.apache.org/security/ " 
title="Security">Security</a></li>
+            </ul>
+          </div> <!-- lucy-top_nav_bar -->
+          <p><a href="http://www.apache.org/";>Apache</a>&nbsp;&raquo&nbsp;<a 
href="/">Lucy</a>&nbsp;&raquo&nbsp;<a 
href="/docs/">Docs</a>&nbsp;&raquo&nbsp;<a 
href="/docs/0.5.0/">0.5.0</a>&nbsp;&raquo&nbsp;<a 
href="/docs/0.5.0/c/">C</a>&nbsp;&raquo&nbsp;<a 
href="/docs/0.5.0/c/Lucy/">Lucy</a>&nbsp;&raquo&nbsp;<a 
href="/docs/0.5.0/c/Lucy/Docs/">Docs</a></p>
+          <form name="lucy-top_search_box" id="lucy-top_search_box" 
action="http://www.google.com/search"; method="get">
+            <input value="*.apache.org" name="sitesearch" type="hidden"/>
+            <input type="text" name="q" id="query" style="width:85%">
+            <input type="submit" id="submit" value="Search">
+          </form>
+        </div> <!-- lucy-top_nav_box -->
+
+        <div class="clear"></div>
+
+      </div> <!-- lucy-top -->
+
+      <div id="lucy-main_content" class="container_16 lucy-white_box_3d">
+
+        <div class="grid_4" id="lucy-left_nav_box">
+          <h6>About</h6>
+            <ul>
+              <li><a href="/">Welcome</a></li>
+              <li><a href="/clownfish.html">Clownfish</a></li>
+              <li><a href="/faq.html">FAQ</a></li>
+              <li><a href="/people.html">People</a></li>
+            </ul>
+          <h6>Resources</h6>
+            <ul>
+              <li><a href="/download.html">Download</a></li>
+              <li><a href="/mailing_lists.html">Mailing Lists</a></li>
+              <li><a href="/docs/">Documentation</a></li>
+              <li><a href="http://wiki.apache.org/lucy/";>Wiki</a></li>
+              <li><a href="https://issues.apache.org/jira/browse/LUCY";>Issue 
Tracker</a></li>
+              <li><a href="/version_control.html">Version Control</a></li>
+            </ul>
+          <h6>Related Projects</h6>
+            <ul>
+              <li><a href="http://lucene.apache.org/core/";>Lucene</a></li>
+              <li><a href="http://dezi.org/";>Dezi</a></li>
+              <li><a href="http://lucene.apache.org/solr/";>Solr</a></li>
+              <li><a href="http://lucenenet.apache.org/";>Lucene.NET</a></li>
+              <li><a 
href="http://lucene.apache.org/pylucene/";>PyLucene</a></li>
+            </ul>
+        </div> <!-- lucy-left_nav_box -->
+
+        <div id="lucy-main_content_box" class="grid_9">
+          <div class="c-api">
+<h2>Manage indexes on shared volumes.</h2>
+<p>Normally, index locking is an invisible process.  Exclusive write access is
+controlled via lockfiles within the index directory and problems only arise
+if multiple processes attempt to acquire the write lock simultaneously;
+search-time processes do not ordinarily require locking at all.</p>
+<p>On shared volumes, however, the default locking mechanism fails, and manual
+intervention becomes necessary.</p>
+<p>Both read and write applications accessing an index on a shared volume need
+to identify themselves with a unique <code>host</code> id, e.g. hostname or
+ip address.  Knowing the host id makes it possible to tell which lockfiles
+belong to other machines and therefore must not be removed when the
+lockfileâs pid number appears not to correspond to an active process.</p>
+<p>At index-time, the danger is that multiple indexing processes from
+different machines which fail to specify a unique <code>host</code> id can
+delete each othersâ lockfiles and then attempt to modify the index at the
+same time, causing index corruption.  The search-time problem is more
+complex.</p>
+<p>Once an index file is no longer listed in the most recent snapshot, Indexer
+attempts to delete it as part of a post-<a href="lucy:Indexer.Commit"></a> 
cleanup routine.  It is
+possible that at the moment an Indexer is deleting files which it believes
+no longer needed, a Searcher referencing an earlier snapshot is in fact
+using them.  The more often that an index is either updated or searched,
+the more likely it is that this conflict will arise from time to time.</p>
+<p>Ordinarily, the deletion attempts are not a problem.   On a typical unix
+volume, the files will be deleted in name only: any process which holds an
+open filehandle against a given file will continue to have access, and the
+file wonât actually get vaporized until the last filehandle is cleared.
+Thanks to âdelete on last close semanticsâ, an Indexer canât truly delete
+the file out from underneath an active Searcher.   On Windows, where file
+deletion fails whenever any process holds an open handle, the situation is
+different but still workable: Indexer just keeps retrying after each commit
+until deletion finally succeeds.</p>
+<p>On NFS, however, the system breaks, because NFS allows files to be deleted
+out from underneath active processes.  Should this happen, the unlucky read
+process will crash with a âStale NFS filehandleâ exception.</p>
+<p>Under normal circumstances, it is neither necessary nor desirable for
+IndexReaders to secure read locks against an index, but for NFS we have to
+make an exception.  LockFactoryâs <a 
href="lucy:LockFactory.Make_Shared_Lock"></a> method exists for this
+reason; supplying an IndexManager instance to IndexReaderâs constructor
+activates an internal locking mechanism using <a 
href="lucy:LockFactory.Make_Shared_Lock"></a> which
+prevents concurrent indexing processes from deleting files that are needed
+by active readers.</p>
+<pre><code>Code example for C is missing</code></pre>
+<p>Since shared locks are implemented using lockfiles located in the index
+directory (as are exclusive locks), reader applications must have write
+access for read locking to work.  Stale lock files from crashed processes
+are ordinarily cleared away the next time the same machine â as identified
+by the <code>host</code> parameter â opens another IndexReader. (The
+classic technique of timing out lock files is not feasible because search
+processes may lie dormant indefinitely.) However, please be aware that if
+the last thing a given machine does is crash, lock files belonging to it
+may persist, preventing deletion of obsolete index data.</p>
+</div>
+
+        </div> <!-- lucy-main_content_box --> 
+        <div class="clear"></div>
+
+      </div> <!-- lucy-main_content -->
+
+      <div id="lucy-copyright" class="container_16">
+        <p>Copyright &#169; 2010-2015 The Apache Software Foundation, Licensed 
under the 
+           <a href="http://www.apache.org/licenses/LICENSE-2.0";>Apache 
License, Version 2.0</a>.
+           <br/>
+           Apache Lucy, Lucy, Apache, the Apache feather logo, and the Apache 
Lucy project logo are trademarks of The
+           Apache Software Foundation.  All other marks mentioned may be 
trademarks or registered trademarks of their
+           respective owners.
+        </p>
+      </div> <!-- lucy-copyright -->
+
+    </div> <!-- lucy-rigid_wrapper -->
+
+  </body>
+</html>

Added: websites/staging/lucy/trunk/content/docs/0.5.0/c/Lucy/Docs/IRTheory.html
==============================================================================
--- websites/staging/lucy/trunk/content/docs/0.5.0/c/Lucy/Docs/IRTheory.html 
(added)
+++ websites/staging/lucy/trunk/content/docs/0.5.0/c/Lucy/Docs/IRTheory.html 
Wed Sep 28 12:07:48 2016
@@ -0,0 +1,133 @@
+<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
+<html lang="en">
+  <head>
+    <meta http-equiv="Content-Type" content="text/html;charset=UTF-8">
+    <title>Lucy::Docs::IRTheory</title>
+    <link rel="stylesheet" type="text/css" media="screen" href="/css/lucy.css">
+  </head>
+
+  <body>
+
+    <div id="lucy-rigid_wrapper">
+
+      <div id="lucy-top" class="container_16 lucy-white_box_3d">
+
+        <div id="lucy-logo_box" class="grid_8">
+          <a href="/"><img src="/images/lucy_logo_150x100.png" alt="Apache 
Lucyâ¢"></a>
+        </div> <!-- lucy-logo_box -->
+
+        <div #id="lucy-top_nav_box" class="grid_8">
+          <div id="lucy-top_nav_bar" class="container_8">
+            <ul>
+              <li><a href="http://www.apache.org/"; title="Apache Software 
Foundation">Apache Software Foundation</a></li>
+              <li><a href="http://www.apache.org/licenses/"; 
title="License">License</a></li>
+              <li><a href="http://www.apache.org/foundation/sponsorship.html"; 
title="Sponsorship">Sponsorship</a></li>
+              <li><a href="http://www.apache.org/foundation/thanks.html"; 
title="Thanks">Thanks</a></li>
+              <li><a href="http://www.apache.org/security/ " 
title="Security">Security</a></li>
+            </ul>
+          </div> <!-- lucy-top_nav_bar -->
+          <p><a href="http://www.apache.org/";>Apache</a>&nbsp;&raquo&nbsp;<a 
href="/">Lucy</a>&nbsp;&raquo&nbsp;<a 
href="/docs/">Docs</a>&nbsp;&raquo&nbsp;<a 
href="/docs/0.5.0/">0.5.0</a>&nbsp;&raquo&nbsp;<a 
href="/docs/0.5.0/c/">C</a>&nbsp;&raquo&nbsp;<a 
href="/docs/0.5.0/c/Lucy/">Lucy</a>&nbsp;&raquo&nbsp;<a 
href="/docs/0.5.0/c/Lucy/Docs/">Docs</a></p>
+          <form name="lucy-top_search_box" id="lucy-top_search_box" 
action="http://www.google.com/search"; method="get">
+            <input value="*.apache.org" name="sitesearch" type="hidden"/>
+            <input type="text" name="q" id="query" style="width:85%">
+            <input type="submit" id="submit" value="Search">
+          </form>
+        </div> <!-- lucy-top_nav_box -->
+
+        <div class="clear"></div>
+
+      </div> <!-- lucy-top -->
+
+      <div id="lucy-main_content" class="container_16 lucy-white_box_3d">
+
+        <div class="grid_4" id="lucy-left_nav_box">
+          <h6>About</h6>
+            <ul>
+              <li><a href="/">Welcome</a></li>
+              <li><a href="/clownfish.html">Clownfish</a></li>
+              <li><a href="/faq.html">FAQ</a></li>
+              <li><a href="/people.html">People</a></li>
+            </ul>
+          <h6>Resources</h6>
+            <ul>
+              <li><a href="/download.html">Download</a></li>
+              <li><a href="/mailing_lists.html">Mailing Lists</a></li>
+              <li><a href="/docs/">Documentation</a></li>
+              <li><a href="http://wiki.apache.org/lucy/";>Wiki</a></li>
+              <li><a href="https://issues.apache.org/jira/browse/LUCY";>Issue 
Tracker</a></li>
+              <li><a href="/version_control.html">Version Control</a></li>
+            </ul>
+          <h6>Related Projects</h6>
+            <ul>
+              <li><a href="http://lucene.apache.org/core/";>Lucene</a></li>
+              <li><a href="http://dezi.org/";>Dezi</a></li>
+              <li><a href="http://lucene.apache.org/solr/";>Solr</a></li>
+              <li><a href="http://lucenenet.apache.org/";>Lucene.NET</a></li>
+              <li><a 
href="http://lucene.apache.org/pylucene/";>PyLucene</a></li>
+            </ul>
+        </div> <!-- lucy-left_nav_box -->
+
+        <div id="lucy-main_content_box" class="grid_9">
+          <div class="c-api">
+<h2>Crash course in information retrieval</h2>
+<p>Just enough Information Retrieval theory to find your way around Apache 
Lucy.</p>
+<h3>Terminology</h3>
+<p>Lucy uses some terminology from the field of information retrieval which
+may be unfamiliar to many users.  âDocumentâ and âtermâ mean pretty 
much what
+youâd expect them to, but others such as âpostingâ and âinverted 
indexâ need a
+formal introduction:</p>
+<ul>
+<li><em>document</em> - An atomic unit of retrieval.</li>
+<li><em>term</em> - An attribute which describes a document.</li>
+<li><em>posting</em> - One term indexing one document.</li>
+<li><em>term list</em> - The complete list of terms which describe a 
document.</li>
+<li><em>posting list</em> - The complete list of documents which a term 
indexes.</li>
+<li><em>inverted index</em> - A data structure which maps from terms to 
documents.</li>
+</ul>
+<p>Since Lucy is a practical implementation of IR theory, it loads these
+abstract, distilled definitions down with useful traits.  For instance, a
+âpostingâ in its most rarefied form is simply a term-document pairing; in
+Lucy, the class MatchPosting fills this
+role.  However, by associating additional information with a posting like the
+number of times the term occurs in the document, we can turn it into a
+ScorePosting, making it possible
+to rank documents by relevance rather than just list documents which happen to
+match in no particular order.</p>
+<h3>TF/IDF ranking algorithm</h3>
+<p>Lucy uses a variant of the well-established âTerm Frequency / Inverse
+Document Frequencyâ weighting scheme.  A thorough treatment of TF/IDF is too
+ambitious for our present purposes, but in a nutshell, it means thatâ¦</p>
+<ul>
+<li>
+<p>in a search for <code>skate park</code>, documents which score well for the
+comparatively rare term <code>skate</code> will rank higher than documents 
which score
+well for the more common term <code>park</code>.</p>
+</li>
+<li>
+<p>a 10-word text which has one occurrence each of both <code>skate</code> and 
<code>park</code> will
+rank higher than a 1000-word text which also contains one occurrence of 
each.</p>
+</li>
+</ul>
+<p>A web search for âtf idfâ will turn up many excellent explanations of 
the
+algorithm.</p>
+</div>
+
+        </div> <!-- lucy-main_content_box --> 
+        <div class="clear"></div>
+
+      </div> <!-- lucy-main_content -->
+
+      <div id="lucy-copyright" class="container_16">
+        <p>Copyright &#169; 2010-2015 The Apache Software Foundation, Licensed 
under the 
+           <a href="http://www.apache.org/licenses/LICENSE-2.0";>Apache 
License, Version 2.0</a>.
+           <br/>
+           Apache Lucy, Lucy, Apache, the Apache feather logo, and the Apache 
Lucy project logo are trademarks of The
+           Apache Software Foundation.  All other marks mentioned may be 
trademarks or registered trademarks of their
+           respective owners.
+        </p>
+      </div> <!-- lucy-copyright -->
+
+    </div> <!-- lucy-rigid_wrapper -->
+
+  </body>
+</html>

Added: websites/staging/lucy/trunk/content/docs/0.5.0/c/Lucy/Docs/Tutorial.html
==============================================================================
--- websites/staging/lucy/trunk/content/docs/0.5.0/c/Lucy/Docs/Tutorial.html 
(added)
+++ websites/staging/lucy/trunk/content/docs/0.5.0/c/Lucy/Docs/Tutorial.html 
Wed Sep 28 12:07:48 2016
@@ -0,0 +1,142 @@
+<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
+<html lang="en">
+  <head>
+    <meta http-equiv="Content-Type" content="text/html;charset=UTF-8">
+    <title>Lucy::Docs::Tutorial</title>
+    <link rel="stylesheet" type="text/css" media="screen" href="/css/lucy.css">
+  </head>
+
+  <body>
+
+    <div id="lucy-rigid_wrapper">
+
+      <div id="lucy-top" class="container_16 lucy-white_box_3d">
+
+        <div id="lucy-logo_box" class="grid_8">
+          <a href="/"><img src="/images/lucy_logo_150x100.png" alt="Apache 
Lucyâ¢"></a>
+        </div> <!-- lucy-logo_box -->
+
+        <div #id="lucy-top_nav_box" class="grid_8">
+          <div id="lucy-top_nav_bar" class="container_8">
+            <ul>
+              <li><a href="http://www.apache.org/"; title="Apache Software 
Foundation">Apache Software Foundation</a></li>
+              <li><a href="http://www.apache.org/licenses/"; 
title="License">License</a></li>
+              <li><a href="http://www.apache.org/foundation/sponsorship.html"; 
title="Sponsorship">Sponsorship</a></li>
+              <li><a href="http://www.apache.org/foundation/thanks.html"; 
title="Thanks">Thanks</a></li>
+              <li><a href="http://www.apache.org/security/ " 
title="Security">Security</a></li>
+            </ul>
+          </div> <!-- lucy-top_nav_bar -->
+          <p><a href="http://www.apache.org/";>Apache</a>&nbsp;&raquo&nbsp;<a 
href="/">Lucy</a>&nbsp;&raquo&nbsp;<a 
href="/docs/">Docs</a>&nbsp;&raquo&nbsp;<a 
href="/docs/0.5.0/">0.5.0</a>&nbsp;&raquo&nbsp;<a 
href="/docs/0.5.0/c/">C</a>&nbsp;&raquo&nbsp;<a 
href="/docs/0.5.0/c/Lucy/">Lucy</a>&nbsp;&raquo&nbsp;<a 
href="/docs/0.5.0/c/Lucy/Docs/">Docs</a></p>
+          <form name="lucy-top_search_box" id="lucy-top_search_box" 
action="http://www.google.com/search"; method="get">
+            <input value="*.apache.org" name="sitesearch" type="hidden"/>
+            <input type="text" name="q" id="query" style="width:85%">
+            <input type="submit" id="submit" value="Search">
+          </form>
+        </div> <!-- lucy-top_nav_box -->
+
+        <div class="clear"></div>
+
+      </div> <!-- lucy-top -->
+
+      <div id="lucy-main_content" class="container_16 lucy-white_box_3d">
+
+        <div class="grid_4" id="lucy-left_nav_box">
+          <h6>About</h6>
+            <ul>
+              <li><a href="/">Welcome</a></li>
+              <li><a href="/clownfish.html">Clownfish</a></li>
+              <li><a href="/faq.html">FAQ</a></li>
+              <li><a href="/people.html">People</a></li>
+            </ul>
+          <h6>Resources</h6>
+            <ul>
+              <li><a href="/download.html">Download</a></li>
+              <li><a href="/mailing_lists.html">Mailing Lists</a></li>
+              <li><a href="/docs/">Documentation</a></li>
+              <li><a href="http://wiki.apache.org/lucy/";>Wiki</a></li>
+              <li><a href="https://issues.apache.org/jira/browse/LUCY";>Issue 
Tracker</a></li>
+              <li><a href="/version_control.html">Version Control</a></li>
+            </ul>
+          <h6>Related Projects</h6>
+            <ul>
+              <li><a href="http://lucene.apache.org/core/";>Lucene</a></li>
+              <li><a href="http://dezi.org/";>Dezi</a></li>
+              <li><a href="http://lucene.apache.org/solr/";>Solr</a></li>
+              <li><a href="http://lucenenet.apache.org/";>Lucene.NET</a></li>
+              <li><a 
href="http://lucene.apache.org/pylucene/";>PyLucene</a></li>
+            </ul>
+        </div> <!-- lucy-left_nav_box -->
+
+        <div id="lucy-main_content_box" class="grid_9">
+          <div class="c-api">
+<h2>Step-by-step introduction to Apache Lucy.</h2>
+<p>Explore Apache Lucyâs basic functionality by starting with a minimalist 
CGI
+search app based on Lucy::Simple and transforming it, step by step,
+into an âadvanced searchâ interface utilizing more flexible core modules 
like
+<a href="../../Lucy/Index/Indexer.html">Indexer</a> and <a 
href="../../Lucy/Search/IndexSearcher.html">IndexSearcher</a>.</p>
+<h3>Chapters</h3>
+<ul>
+<li>
+<p><a href="../../Lucy/Docs/Tutorial/SimpleTutorial.html">SimpleTutorial</a> - 
Build a bare-bones search app using
+Lucy::Simple.</p>
+</li>
+<li>
+<p><a 
href="../../Lucy/Docs/Tutorial/BeyondSimpleTutorial.html">BeyondSimpleTutorial</a>
 - Rebuild the app using core
+classes like <a href="../../Lucy/Index/Indexer.html">Indexer</a> and
+<a href="../../Lucy/Search/IndexSearcher.html">IndexSearcher</a> in place of 
Lucy::Simple.</p>
+</li>
+<li>
+<p><a 
href="../../Lucy/Docs/Tutorial/FieldTypeTutorial.html">FieldTypeTutorial</a> - 
Experiment with different field
+characteristics using subclasses of <a 
href="../../Lucy/Plan/FieldType.html">FieldType</a>.</p>
+</li>
+<li>
+<p><a 
href="../../Lucy/Docs/Tutorial/AnalysisTutorial.html">AnalysisTutorial</a> - 
Examine how the choice of
+<a href="../../Lucy/Analysis/Analyzer.html">Analyzer</a> subclass affects 
search results.</p>
+</li>
+<li>
+<p><a 
href="../../Lucy/Docs/Tutorial/HighlighterTutorial.html">HighlighterTutorial</a>
 - Augment search results with
+highlighted excerpts.</p>
+</li>
+<li>
+<p><a 
href="../../Lucy/Docs/Tutorial/QueryObjectsTutorial.html">QueryObjectsTutorial</a>
 - Unlock advanced search features
+by using Query objects instead of query strings.</p>
+</li>
+</ul>
+<h3>Source materials</h3>
+<p>The source material used by the tutorial app â a multi-text-file 
presentation
+of the United States constitution â can be found in the <code>sample</code> 
directory
+at the root of the Lucy distribution, along with finished indexing and search
+apps.</p>
+<pre><code class="language-c">sample/indexer_simple.c  # simple indexing 
executable
+sample/search_simple.c   # simple search executable
+sample/indexer.c         # indexing executable
+sample/search.c          # search executable
+sample/us_constitution   # corpus
+</code></pre>
+<h3>Conventions</h3>
+<p>The user is expected to be familiar with OO Perl and basic CGI 
programming.</p>
+<p>The code in this tutorial assumes a Unix-flavored operating system and the
+Apache webserver, but will work with minor modifications on other setups.</p>
+<h3>See also</h3>
+<p>More advanced and esoteric subjects are covered in <a 
href="../../Lucy/Docs/Cookbook.html">Cookbook</a>.</p>
+</div>
+
+        </div> <!-- lucy-main_content_box --> 
+        <div class="clear"></div>
+
+      </div> <!-- lucy-main_content -->
+
+      <div id="lucy-copyright" class="container_16">
+        <p>Copyright &#169; 2010-2015 The Apache Software Foundation, Licensed 
under the 
+           <a href="http://www.apache.org/licenses/LICENSE-2.0";>Apache 
License, Version 2.0</a>.
+           <br/>
+           Apache Lucy, Lucy, Apache, the Apache feather logo, and the Apache 
Lucy project logo are trademarks of The
+           Apache Software Foundation.  All other marks mentioned may be 
trademarks or registered trademarks of their
+           respective owners.
+        </p>
+      </div> <!-- lucy-copyright -->
+
+    </div> <!-- lucy-rigid_wrapper -->
+
+  </body>
+</html>

Added: 
websites/staging/lucy/trunk/content/docs/0.5.0/c/Lucy/Docs/Tutorial/AnalysisTutorial.html
==============================================================================
--- 
websites/staging/lucy/trunk/content/docs/0.5.0/c/Lucy/Docs/Tutorial/AnalysisTutorial.html
 (added)
+++ 
websites/staging/lucy/trunk/content/docs/0.5.0/c/Lucy/Docs/Tutorial/AnalysisTutorial.html
 Wed Sep 28 12:07:48 2016
@@ -0,0 +1,152 @@
+<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
+<html lang="en">
+  <head>
+    <meta http-equiv="Content-Type" content="text/html;charset=UTF-8">
+    <title>Lucy::Docs::Tutorial::AnalysisTutorial</title>
+    <link rel="stylesheet" type="text/css" media="screen" href="/css/lucy.css">
+  </head>
+
+  <body>
+
+    <div id="lucy-rigid_wrapper">
+
+      <div id="lucy-top" class="container_16 lucy-white_box_3d">
+
+        <div id="lucy-logo_box" class="grid_8">
+          <a href="/"><img src="/images/lucy_logo_150x100.png" alt="Apache 
Lucyâ¢"></a>
+        </div> <!-- lucy-logo_box -->
+
+        <div #id="lucy-top_nav_box" class="grid_8">
+          <div id="lucy-top_nav_bar" class="container_8">
+            <ul>
+              <li><a href="http://www.apache.org/"; title="Apache Software 
Foundation">Apache Software Foundation</a></li>
+              <li><a href="http://www.apache.org/licenses/"; 
title="License">License</a></li>
+              <li><a href="http://www.apache.org/foundation/sponsorship.html"; 
title="Sponsorship">Sponsorship</a></li>
+              <li><a href="http://www.apache.org/foundation/thanks.html"; 
title="Thanks">Thanks</a></li>
+              <li><a href="http://www.apache.org/security/ " 
title="Security">Security</a></li>
+            </ul>
+          </div> <!-- lucy-top_nav_bar -->
+          <p><a href="http://www.apache.org/";>Apache</a>&nbsp;&raquo&nbsp;<a 
href="/">Lucy</a>&nbsp;&raquo&nbsp;<a 
href="/docs/">Docs</a>&nbsp;&raquo&nbsp;<a 
href="/docs/0.5.0/">0.5.0</a>&nbsp;&raquo&nbsp;<a 
href="/docs/0.5.0/c/">C</a>&nbsp;&raquo&nbsp;<a 
href="/docs/0.5.0/c/Lucy/">Lucy</a>&nbsp;&raquo&nbsp;<a 
href="/docs/0.5.0/c/Lucy/Docs/">Docs</a>&nbsp;&raquo&nbsp;<a 
href="/docs/0.5.0/c/Lucy/Docs/Tutorial/">Tutorial</a></p>
+          <form name="lucy-top_search_box" id="lucy-top_search_box" 
action="http://www.google.com/search"; method="get">
+            <input value="*.apache.org" name="sitesearch" type="hidden"/>
+            <input type="text" name="q" id="query" style="width:85%">
+            <input type="submit" id="submit" value="Search">
+          </form>
+        </div> <!-- lucy-top_nav_box -->
+
+        <div class="clear"></div>
+
+      </div> <!-- lucy-top -->
+
+      <div id="lucy-main_content" class="container_16 lucy-white_box_3d">
+
+        <div class="grid_4" id="lucy-left_nav_box">
+          <h6>About</h6>
+            <ul>
+              <li><a href="/">Welcome</a></li>
+              <li><a href="/clownfish.html">Clownfish</a></li>
+              <li><a href="/faq.html">FAQ</a></li>
+              <li><a href="/people.html">People</a></li>
+            </ul>
+          <h6>Resources</h6>
+            <ul>
+              <li><a href="/download.html">Download</a></li>
+              <li><a href="/mailing_lists.html">Mailing Lists</a></li>
+              <li><a href="/docs/">Documentation</a></li>
+              <li><a href="http://wiki.apache.org/lucy/";>Wiki</a></li>
+              <li><a href="https://issues.apache.org/jira/browse/LUCY";>Issue 
Tracker</a></li>
+              <li><a href="/version_control.html">Version Control</a></li>
+            </ul>
+          <h6>Related Projects</h6>
+            <ul>
+              <li><a href="http://lucene.apache.org/core/";>Lucene</a></li>
+              <li><a href="http://dezi.org/";>Dezi</a></li>
+              <li><a href="http://lucene.apache.org/solr/";>Solr</a></li>
+              <li><a href="http://lucenenet.apache.org/";>Lucene.NET</a></li>
+              <li><a 
href="http://lucene.apache.org/pylucene/";>PyLucene</a></li>
+            </ul>
+        </div> <!-- lucy-left_nav_box -->
+
+        <div id="lucy-main_content_box" class="grid_9">
+          <div class="c-api">
+<h2>How to choose and use Analyzers.</h2>
+<p>Try swapping out the EasyAnalyzer in our Schema for a
+<a 
href="../../../Lucy/Analysis/StandardTokenizer.html">StandardTokenizer</a>:</p>
+<pre><code class="language-c">    StandardTokenizer *tokenizer = 
StandardTokenizer_new();
+    FullTextType *type = FullTextType_new((Analyzer*)tokenizer);
+</code></pre>
+<p>Search for <code>senate</code>, <code>Senate</code>, and 
<code>Senator</code> before and after making the
+change and re-indexing.</p>
+<p>Under EasyAnalyzer, the results are identical for all three searches, but
+under StandardTokenizer, searches are case-sensitive, and the result sets for
+<code>Senate</code> and <code>Senator</code> are distinct.</p>
+<h3>EasyAnalyzer</h3>
+<p>Whatâs happening is that <a 
href="../../../Lucy/Analysis/EasyAnalyzer.html">EasyAnalyzer</a> is performing 
more aggressive
+processing than StandardTokenizer.  In addition to tokenizing, itâs also
+converting all text to lower case so that searches are case-insensitive, and
+using a âstemmingâ algorithm to reduce related words to a common stem 
(<code>senat</code>,
+in this case).</p>
+<p>EasyAnalyzer is actually multiple Analyzers wrapped up in a single package.
+In this case, itâs three-in-one, since specifying a EasyAnalyzer with
+<code>language =&gt; 'en'</code> is equivalent to this snippet creating a
+<a href="../../../Lucy/Analysis/PolyAnalyzer.html">PolyAnalyzer</a>:</p>
+<pre><code class="language-c">    Vector *analyzers = Vec_new(3);
+    Vec_Push(analyzers, (Analyzer*)StandardTokenizer_new());
+    Vec_Push(analyzers, (Analyzer*)Normalizer_new(NULL, true, false));
+    Vec_Push(analyzers, (Analyzer*)SnowStemmer_new(language));
+
+    PolyAnalyzer *analyzer = PolyAnalyzer_new(NULL, analyzers);
+    DECREC(analyzers);
+</code></pre>
+<p>You can add or subtract Analyzers from there if you like.  Try adding a 
fourth
+Analyzer, a SnowballStopFilter for suppressing âstopwordsâ like âtheâ, 
âifâ,
+and âmaybeâ.</p>
+<pre><code class="language-c">    Vec_Push(analyzers, 
(Analyzer*)StandardTokenizer_new());
+    Vec_Push(analyzers, (Analyzer*)Normalizer_new(NULL, true, false));
+    Vec_Push(analyzers, (Analyzer*)SnowStemmer_new(language));
+    Vec_Push(analyzers, (Analyzer*)SnowStop_new(language, NULL));
+</code></pre>
+<p>Also, try removing the SnowballStemmer.</p>
+<pre><code class="language-c">    Vec_Push(analyzers, 
(Analyzer*)StandardTokenizer_new());
+    Vec_Push(analyzers, (Analyzer*)Normalizer_new(NULL, true, false));
+</code></pre>
+<p>The original choice of a stock English EasyAnalyzer probably still yields 
the
+best results for this document collection, but you get the idea: sometimes you
+want a different Analyzer.</p>
+<h3>When the best Analyzer is no Analyzer</h3>
+<p>Sometimes you donât want an Analyzer at all.  That was true for our 
âurlâ
+field because we didnât need it to be searchable, but itâs also true for
+certain types of searchable fields.  For instance, âcategoryâ fields are 
often
+set up to match exactly or not at all, as are fields like âlast_nameâ 
(because
+you may not want to conflate results for âHumphreyâ and 
âHumphriesâ).</p>
+<p>To specify that there should be no analysis performed at all, use 
StringType:</p>
+<pre><code class="language-c">    String     *name = 
Str_newf(&quot;category&quot;);
+    StringType *type = StringType_new();
+    Schema_Spec_Field(schema, name, (FieldType*)type);
+    DECREF(type);
+    DECREF(name);
+</code></pre>
+<h3>Highlighting up next</h3>
+<p>In our next tutorial chapter, <a 
href="../../../Lucy/Docs/Tutorial/HighlighterTutorial.html">HighlighterTutorial</a>,
+weâll add highlighted excerpts from the âcontentâ field to our search 
results.</p>
+</div>
+
+        </div> <!-- lucy-main_content_box --> 
+        <div class="clear"></div>
+
+      </div> <!-- lucy-main_content -->
+
+      <div id="lucy-copyright" class="container_16">
+        <p>Copyright &#169; 2010-2015 The Apache Software Foundation, Licensed 
under the 
+           <a href="http://www.apache.org/licenses/LICENSE-2.0";>Apache 
License, Version 2.0</a>.
+           <br/>
+           Apache Lucy, Lucy, Apache, the Apache feather logo, and the Apache 
Lucy project logo are trademarks of The
+           Apache Software Foundation.  All other marks mentioned may be 
trademarks or registered trademarks of their
+           respective owners.
+        </p>
+      </div> <!-- lucy-copyright -->
+
+    </div> <!-- lucy-rigid_wrapper -->
+
+  </body>
+</html>

Added: 
websites/staging/lucy/trunk/content/docs/0.5.0/c/Lucy/Docs/Tutorial/BeyondSimpleTutorial.html
==============================================================================
--- 
websites/staging/lucy/trunk/content/docs/0.5.0/c/Lucy/Docs/Tutorial/BeyondSimpleTutorial.html
 (added)
+++ 
websites/staging/lucy/trunk/content/docs/0.5.0/c/Lucy/Docs/Tutorial/BeyondSimpleTutorial.html
 Wed Sep 28 12:07:48 2016
@@ -0,0 +1,296 @@
+<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
+<html lang="en">
+  <head>
+    <meta http-equiv="Content-Type" content="text/html;charset=UTF-8">
+    <title>Lucy::Docs::Tutorial::BeyondSimpleTutorial</title>
+    <link rel="stylesheet" type="text/css" media="screen" href="/css/lucy.css">
+  </head>
+
+  <body>
+
+    <div id="lucy-rigid_wrapper">
+
+      <div id="lucy-top" class="container_16 lucy-white_box_3d">
+
+        <div id="lucy-logo_box" class="grid_8">
+          <a href="/"><img src="/images/lucy_logo_150x100.png" alt="Apache 
Lucyâ¢"></a>
+        </div> <!-- lucy-logo_box -->
+
+        <div #id="lucy-top_nav_box" class="grid_8">
+          <div id="lucy-top_nav_bar" class="container_8">
+            <ul>
+              <li><a href="http://www.apache.org/"; title="Apache Software 
Foundation">Apache Software Foundation</a></li>
+              <li><a href="http://www.apache.org/licenses/"; 
title="License">License</a></li>
+              <li><a href="http://www.apache.org/foundation/sponsorship.html"; 
title="Sponsorship">Sponsorship</a></li>
+              <li><a href="http://www.apache.org/foundation/thanks.html"; 
title="Thanks">Thanks</a></li>
+              <li><a href="http://www.apache.org/security/ " 
title="Security">Security</a></li>
+            </ul>
+          </div> <!-- lucy-top_nav_bar -->
+          <p><a href="http://www.apache.org/";>Apache</a>&nbsp;&raquo&nbsp;<a 
href="/">Lucy</a>&nbsp;&raquo&nbsp;<a 
href="/docs/">Docs</a>&nbsp;&raquo&nbsp;<a 
href="/docs/0.5.0/">0.5.0</a>&nbsp;&raquo&nbsp;<a 
href="/docs/0.5.0/c/">C</a>&nbsp;&raquo&nbsp;<a 
href="/docs/0.5.0/c/Lucy/">Lucy</a>&nbsp;&raquo&nbsp;<a 
href="/docs/0.5.0/c/Lucy/Docs/">Docs</a>&nbsp;&raquo&nbsp;<a 
href="/docs/0.5.0/c/Lucy/Docs/Tutorial/">Tutorial</a></p>
+          <form name="lucy-top_search_box" id="lucy-top_search_box" 
action="http://www.google.com/search"; method="get">
+            <input value="*.apache.org" name="sitesearch" type="hidden"/>
+            <input type="text" name="q" id="query" style="width:85%">
+            <input type="submit" id="submit" value="Search">
+          </form>
+        </div> <!-- lucy-top_nav_box -->
+
+        <div class="clear"></div>
+
+      </div> <!-- lucy-top -->
+
+      <div id="lucy-main_content" class="container_16 lucy-white_box_3d">
+
+        <div class="grid_4" id="lucy-left_nav_box">
+          <h6>About</h6>
+            <ul>
+              <li><a href="/">Welcome</a></li>
+              <li><a href="/clownfish.html">Clownfish</a></li>
+              <li><a href="/faq.html">FAQ</a></li>
+              <li><a href="/people.html">People</a></li>
+            </ul>
+          <h6>Resources</h6>
+            <ul>
+              <li><a href="/download.html">Download</a></li>
+              <li><a href="/mailing_lists.html">Mailing Lists</a></li>
+              <li><a href="/docs/">Documentation</a></li>
+              <li><a href="http://wiki.apache.org/lucy/";>Wiki</a></li>
+              <li><a href="https://issues.apache.org/jira/browse/LUCY";>Issue 
Tracker</a></li>
+              <li><a href="/version_control.html">Version Control</a></li>
+            </ul>
+          <h6>Related Projects</h6>
+            <ul>
+              <li><a href="http://lucene.apache.org/core/";>Lucene</a></li>
+              <li><a href="http://dezi.org/";>Dezi</a></li>
+              <li><a href="http://lucene.apache.org/solr/";>Solr</a></li>
+              <li><a href="http://lucenenet.apache.org/";>Lucene.NET</a></li>
+              <li><a 
href="http://lucene.apache.org/pylucene/";>PyLucene</a></li>
+            </ul>
+        </div> <!-- lucy-left_nav_box -->
+
+        <div id="lucy-main_content_box" class="grid_9">
+          <div class="c-api">
+<h2>A more flexible app structure.</h2>
+<h3>Goal</h3>
+<p>In this tutorial chapter, weâll refactor the apps we built in
+<a href="../../../Lucy/Docs/Tutorial/SimpleTutorial.html">SimpleTutorial</a> 
so that they look exactly the same from
+the end userâs point of view, but offer the developer greater possibilites 
for
+expansion.</p>
+<p>To achieve this, weâll ditch Lucy::Simple and replace it with the
+classes that it uses internally:</p>
+<ul>
+<li><a href="../../../Lucy/Plan/Schema.html">Schema</a> - Plan out your 
index.</li>
+<li><a href="../../../Lucy/Plan/FullTextType.html">FullTextType</a> - Field 
type for full text search.</li>
+<li><a href="../../../Lucy/Analysis/EasyAnalyzer.html">EasyAnalyzer</a> - A 
one-size-fits-all parser/tokenizer.</li>
+<li><a href="../../../Lucy/Index/Indexer.html">Indexer</a> - Manipulate index 
content.</li>
+<li><a href="../../../Lucy/Search/IndexSearcher.html">IndexSearcher</a> - 
Search an index.</li>
+<li><a href="../../../Lucy/Search/Hits.html">Hits</a> - Iterate over hits 
returned by a Searcher.</li>
+</ul>
+<h3>Adaptations to indexer.pl</h3>
+<p>After we load our modulesâ¦</p>
+<pre><code class="language-c">#include &lt;dirent.h&gt;
+#include &lt;stdio.h&gt;
+#include &lt;stdlib.h&gt;
+#include &lt;string.h&gt;
+
+#define CFISH_USE_SHORT_NAMES
+#define LUCY_USE_SHORT_NAMES
+#include &quot;Clownfish/String.h&quot;
+#include &quot;Lucy/Analysis/EasyAnalyzer.h&quot;
+#include &quot;Lucy/Document/Doc.h&quot;
+#include &quot;Lucy/Index/Indexer.h&quot;
+#include &quot;Lucy/Plan/FullTextType.h&quot;
+#include &quot;Lucy/Plan/StringType.h&quot;
+#include &quot;Lucy/Plan/Schema.h&quot;
+
+const char path_to_index[] = &quot;/path/to/index&quot;;
+const char uscon_source[]  = 
&quot;/usr/local/apache2/htdocs/us_constitution&quot;;
+</code></pre>
+<p>â¦ the first item weâre going need is a <a 
href="../../../Lucy/Plan/Schema.html">Schema</a>.</p>
+<p>The primary job of a Schema is to specify what fields are available and how
+theyâre defined.  Weâll start off with three fields: title, content and 
url.</p>
+<pre><code class="language-c">static Schema*
+S_create_schema() {
+    // Create a new schema.
+    Schema *schema = Schema_new();
+
+    // Create an analyzer.
+    String       *language = Str_newf(&quot;en&quot;);
+    EasyAnalyzer *analyzer = EasyAnalyzer_new(language);
+
+    // Specify fields.
+
+    FullTextType *type = FullTextType_new((Analyzer*)analyzer);
+
+    {
+        String *field_str = Str_newf(&quot;title&quot;);
+        Schema_Spec_Field(schema, field_str, (FieldType*)type);
+        DECREF(field_str);
+    }
+
+    {
+        String *field_str = Str_newf(&quot;content&quot;);
+        Schema_Spec_Field(schema, field_str, (FieldType*)type);
+        DECREF(field_str);
+    }
+
+    {
+        String *field_str = Str_newf(&quot;url&quot;);
+        Schema_Spec_Field(schema, field_str, (FieldType*)type);
+        DECREF(field_str);
+    }
+
+    DECREF(type);
+    DECREF(analyzer);
+    DECREF(language);
+    return schema;
+}
+</code></pre>
+<p>All of the fields are specâd out using the <a 
href="../../../Lucy/Plan/FullTextType.html">FullTextType</a> FieldType,
+indicating that they will be searchable as âfull textâ â which means that
+they can be searched for individual words.  The âanalyzerâ, which is 
unique to
+FullTextType fields, is what breaks up the text into searchable tokens.</p>
+<p>Next, weâll swap our Lucy::Simple object out for an <a 
href="../../../Lucy/Index/Indexer.html">Indexer</a>.
+The substitution will be straightforward because Simple has merely been
+serving as a thin wrapper around an inner Indexer, and weâll just be peeling
+away the wrapper.</p>
+<p>First, replace the constructor:</p>
+<pre><code class="language-c">int
+main() {
+    // Initialize the library.
+    lucy_bootstrap_parcel();
+
+    Schema *schema = S_create_schema();
+    String *folder = Str_newf(&quot;%s&quot;, path_to_index);
+
+    Indexer *indexer = Indexer_new(schema, (Obj*)folder, NULL,
+                                   Indexer_CREATE | Indexer_TRUNCATE);
+
+</code></pre>
+<p>Next, have the <code>indexer</code> object <a 
href="../../../Lucy/Index/Indexer.html#func_Add_Doc">Add_Doc()</a> where we
+were having the <code>lucy</code> object adding the document before:</p>
+<pre><code class="language-c">    DIR *dir = opendir(uscon_source);
+    if (dir == NULL) {
+        perror(uscon_source);
+        return 1;
+    }
+
+    for (struct dirent *entry = readdir(dir);
+         entry;
+         entry = readdir(dir)) {
+
+        if (S_ends_with(entry-&gt;d_name, &quot;.txt&quot;)) {
+            Doc *doc = S_parse_file(entry-&gt;d_name);
+            Indexer_Add_Doc(indexer, doc, 1.0);
+            DECREF(doc);
+        }
+    }
+
+    closedir(dir);
+</code></pre>
+<p>Thereâs only one extra step required: at the end of the app, you must call
+commit() explicitly to close the indexing session and commit your changes.
+(Lucy::Simple hides this detail, calling commit() implicitly when it needs 
to).</p>
+<pre><code class="language-c">    Indexer_Commit(indexer);
+
+    DECREF(indexer);
+    DECREF(folder);
+    DECREF(schema);
+    return 0;
+}
+</code></pre>
+<h3>Adaptations to search.cgi</h3>
+<p>In our search app as in our indexing app, Lucy::Simple has served as a
+thin wrapper â this time around <a 
href="../../../Lucy/Search/IndexSearcher.html">IndexSearcher</a> and
+<a href="../../../Lucy/Search/Hits.html">Hits</a>.  Swapping out Simple for 
these two classes is
+also straightforward:</p>
+<pre><code class="language-c">#include &lt;stdio.h&gt;
+#include &lt;stdlib.h&gt;
+#include &lt;string.h&gt;
+
+#define CFISH_USE_SHORT_NAMES
+#define LUCY_USE_SHORT_NAMES
+#include &quot;Clownfish/String.h&quot;
+#include &quot;Lucy/Document/HitDoc.h&quot;
+#include &quot;Lucy/Search/Hits.h&quot;
+#include &quot;Lucy/Search/IndexSearcher.h&quot;
+
+const char path_to_index[] = &quot;/path/to/index&quot;;
+
+int
+main(int argc, char *argv[]) {
+    // Initialize the library.
+    lucy_bootstrap_parcel();
+
+    if (argc &lt; 2) {
+        printf(&quot;Usage: %s &lt;querystring&gt;\n&quot;, argv[0]);
+        return 0;
+    }
+
+    const char *query_c = argv[1];
+
+    printf(&quot;Searching for: %s\n\n&quot;, query_c);
+
+    String        *folder   = Str_newf(&quot;%s&quot;, path_to_index);
+    IndexSearcher *searcher = IxSearcher_new((Obj*)folder);
+
+    String *query_str = Str_newf(&quot;%s&quot;, query_c);
+    Hits *hits = IxSearcher_Hits(searcher, (Obj*)query_str, 0, 10, NULL);
+
+    String *title_str = Str_newf(&quot;title&quot;);
+    String *url_str   = Str_newf(&quot;url&quot;);
+    HitDoc *hit;
+    int i = 1;
+
+    // Loop over search results.
+    while (NULL != (hit = Hits_Next(hits))) {
+        String *title = (String*)HitDoc_Extract(hit, title_str);
+        char *title_c = Str_To_Utf8(title);
+
+        String *url = (String*)HitDoc_Extract(hit, url_str);
+        char *url_c = Str_To_Utf8(url);
+
+        printf(&quot;Result %d: %s (%s)\n&quot;, i, title_c, url_c);
+
+        free(url_c);
+        free(title_c);
+        DECREF(url);
+        DECREF(title);
+        DECREF(hit);
+        i++;
+    }
+
+    DECREF(url_str);
+    DECREF(title_str);
+    DECREF(hits);
+    DECREF(query_str);
+    DECREF(searcher);
+    DECREF(folder);
+    return 0;
+}
+</code></pre>
+<h3>Hooray!</h3>
+<p>Congratulations!  Your apps do the same thing as beforeâ¦ but now 
theyâll be
+easier to customize.</p>
+<p>In our next chapter, <a 
href="../../../Lucy/Docs/Tutorial/FieldTypeTutorial.html">FieldTypeTutorial</a>,
 weâll explore
+how to assign different behaviors to different fields.</p>
+</div>
+
+        </div> <!-- lucy-main_content_box --> 
+        <div class="clear"></div>
+
+      </div> <!-- lucy-main_content -->
+
+      <div id="lucy-copyright" class="container_16">
+        <p>Copyright &#169; 2010-2015 The Apache Software Foundation, Licensed 
under the 
+           <a href="http://www.apache.org/licenses/LICENSE-2.0";>Apache 
License, Version 2.0</a>.
+           <br/>
+           Apache Lucy, Lucy, Apache, the Apache feather logo, and the Apache 
Lucy project logo are trademarks of The
+           Apache Software Foundation.  All other marks mentioned may be 
trademarks or registered trademarks of their
+           respective owners.
+        </p>
+      </div> <!-- lucy-copyright -->
+
+    </div> <!-- lucy-rigid_wrapper -->
+
+  </body>
+</html>

Added: 
websites/staging/lucy/trunk/content/docs/0.5.0/c/Lucy/Docs/Tutorial/FieldTypeTutorial.html
==============================================================================
--- 
websites/staging/lucy/trunk/content/docs/0.5.0/c/Lucy/Docs/Tutorial/FieldTypeTutorial.html
 (added)
+++ 
websites/staging/lucy/trunk/content/docs/0.5.0/c/Lucy/Docs/Tutorial/FieldTypeTutorial.html
 Wed Sep 28 12:07:48 2016
@@ -0,0 +1,151 @@
+<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
+<html lang="en">
+  <head>
+    <meta http-equiv="Content-Type" content="text/html;charset=UTF-8">
+    <title>Lucy::Docs::Tutorial::FieldTypeTutorial</title>
+    <link rel="stylesheet" type="text/css" media="screen" href="/css/lucy.css">
+  </head>
+
+  <body>
+
+    <div id="lucy-rigid_wrapper">
+
+      <div id="lucy-top" class="container_16 lucy-white_box_3d">
+
+        <div id="lucy-logo_box" class="grid_8">
+          <a href="/"><img src="/images/lucy_logo_150x100.png" alt="Apache 
Lucyâ¢"></a>
+        </div> <!-- lucy-logo_box -->
+
+        <div #id="lucy-top_nav_box" class="grid_8">
+          <div id="lucy-top_nav_bar" class="container_8">
+            <ul>
+              <li><a href="http://www.apache.org/"; title="Apache Software 
Foundation">Apache Software Foundation</a></li>
+              <li><a href="http://www.apache.org/licenses/"; 
title="License">License</a></li>
+              <li><a href="http://www.apache.org/foundation/sponsorship.html"; 
title="Sponsorship">Sponsorship</a></li>
+              <li><a href="http://www.apache.org/foundation/thanks.html"; 
title="Thanks">Thanks</a></li>
+              <li><a href="http://www.apache.org/security/ " 
title="Security">Security</a></li>
+            </ul>
+          </div> <!-- lucy-top_nav_bar -->
+          <p><a href="http://www.apache.org/";>Apache</a>&nbsp;&raquo&nbsp;<a 
href="/">Lucy</a>&nbsp;&raquo&nbsp;<a 
href="/docs/">Docs</a>&nbsp;&raquo&nbsp;<a 
href="/docs/0.5.0/">0.5.0</a>&nbsp;&raquo&nbsp;<a 
href="/docs/0.5.0/c/">C</a>&nbsp;&raquo&nbsp;<a 
href="/docs/0.5.0/c/Lucy/">Lucy</a>&nbsp;&raquo&nbsp;<a 
href="/docs/0.5.0/c/Lucy/Docs/">Docs</a>&nbsp;&raquo&nbsp;<a 
href="/docs/0.5.0/c/Lucy/Docs/Tutorial/">Tutorial</a></p>
+          <form name="lucy-top_search_box" id="lucy-top_search_box" 
action="http://www.google.com/search"; method="get">
+            <input value="*.apache.org" name="sitesearch" type="hidden"/>
+            <input type="text" name="q" id="query" style="width:85%">
+            <input type="submit" id="submit" value="Search">
+          </form>
+        </div> <!-- lucy-top_nav_box -->
+
+        <div class="clear"></div>
+
+      </div> <!-- lucy-top -->
+
+      <div id="lucy-main_content" class="container_16 lucy-white_box_3d">
+
+        <div class="grid_4" id="lucy-left_nav_box">
+          <h6>About</h6>
+            <ul>
+              <li><a href="/">Welcome</a></li>
+              <li><a href="/clownfish.html">Clownfish</a></li>
+              <li><a href="/faq.html">FAQ</a></li>
+              <li><a href="/people.html">People</a></li>
+            </ul>
+          <h6>Resources</h6>
+            <ul>
+              <li><a href="/download.html">Download</a></li>
+              <li><a href="/mailing_lists.html">Mailing Lists</a></li>
+              <li><a href="/docs/">Documentation</a></li>
+              <li><a href="http://wiki.apache.org/lucy/";>Wiki</a></li>
+              <li><a href="https://issues.apache.org/jira/browse/LUCY";>Issue 
Tracker</a></li>
+              <li><a href="/version_control.html">Version Control</a></li>
+            </ul>
+          <h6>Related Projects</h6>
+            <ul>
+              <li><a href="http://lucene.apache.org/core/";>Lucene</a></li>
+              <li><a href="http://dezi.org/";>Dezi</a></li>
+              <li><a href="http://lucene.apache.org/solr/";>Solr</a></li>
+              <li><a href="http://lucenenet.apache.org/";>Lucene.NET</a></li>
+              <li><a 
href="http://lucene.apache.org/pylucene/";>PyLucene</a></li>
+            </ul>
+        </div> <!-- lucy-left_nav_box -->
+
+        <div id="lucy-main_content_box" class="grid_9">
+          <div class="c-api">
+<h2>Specify per-field properties and behaviors.</h2>
+<p>The Schema we used in the last chapter specifies three fields:</p>
+<pre><code class="language-c">    FullTextType *type = 
FullTextType_new((Analyzer*)analyzer);
+
+    {
+        String *field_str = Str_newf(&quot;title&quot;);
+        Schema_Spec_Field(schema, field_str, (FieldType*)type);
+        DECREF(field_str);
+    }
+
+    {
+        String *field_str = Str_newf(&quot;content&quot;);
+        Schema_Spec_Field(schema, field_str, (FieldType*)type);
+        DECREF(field_str);
+    }
+
+    {
+        String *field_str = Str_newf(&quot;url&quot;);
+        Schema_Spec_Field(schema, field_str, (FieldType*)type);
+        DECREF(field_str);
+    }
+
+</code></pre>
+<p>Since they are all defined as âfull textâ fields, they are all 
searchable â
+including the <code>url</code> field, a dubious choice.  Some URLs contain 
meaningful
+information, but these donât, really:</p>
+<pre><code>http://example.com/us_constitution/amend1.txt
+</code></pre>
+<p>We may as well not bother indexing the URL content.  To achieve that we need
+to assign the <code>url</code> field to a different FieldType.</p>
+<h3>StringType</h3>
+<p>Instead of FullTextType, weâll use a
+<a href="../../../Lucy/Plan/StringType.html">StringType</a>, which doesnât 
use an
+Analyzer to break up text into individual fields.  Furthermore, weâll mark
+this StringType as unindexed, so that its content wonât be searchable at 
all.</p>
+<pre><code class="language-c">    {
+        String *field_str = Str_newf(&quot;url&quot;);
+        StringType *type = StringType_new();
+        StringType_Set_Indexed(type, false);
+        Schema_Spec_Field(schema, field_str, (FieldType*)type);
+        DECREF(type);
+        DECREF(field_str);
+    }
+</code></pre>
+<p>To observe the change in behavior, try searching for 
<code>us_constitution</code> both
+before and after changing the Schema and re-indexing.</p>
+<h3>Toggling âstoredâ</h3>
+<p>For a taste of other FieldType possibilities, try turning off 
<code>stored</code> for
+one or more fields.</p>
+<pre><code class="language-c">    FullTextType *content_type = 
FullTextType_new((Analyzer*)analyzer);
+    FullTextType_Set_Stored(content_type, false);
+</code></pre>
+<p>Turning off <code>stored</code> for either <code>title</code> or 
<code>url</code> mangles our results page,
+but since weâre not displaying <code>content</code>, turning it off for 
<code>content</code> has
+no effect â except on index size.</p>
+<h3>Analyzers up next</h3>
+<p>Analyzers play a crucial role in the behavior of FullTextType fields.  In 
our
+next tutorial chapter, <a 
href="../../../Lucy/Docs/Tutorial/AnalysisTutorial.html">AnalysisTutorial</a>, 
weâll see how
+changing up the Analyzer changes search results.</p>
+</div>
+
+        </div> <!-- lucy-main_content_box --> 
+        <div class="clear"></div>
+
+      </div> <!-- lucy-main_content -->
+
+      <div id="lucy-copyright" class="container_16">
+        <p>Copyright &#169; 2010-2015 The Apache Software Foundation, Licensed 
under the 
+           <a href="http://www.apache.org/licenses/LICENSE-2.0";>Apache 
License, Version 2.0</a>.
+           <br/>
+           Apache Lucy, Lucy, Apache, the Apache feather logo, and the Apache 
Lucy project logo are trademarks of The
+           Apache Software Foundation.  All other marks mentioned may be 
trademarks or registered trademarks of their
+           respective owners.
+        </p>
+      </div> <!-- lucy-copyright -->
+
+    </div> <!-- lucy-rigid_wrapper -->
+
+  </body>
+</html>

Added: 
websites/staging/lucy/trunk/content/docs/0.5.0/c/Lucy/Docs/Tutorial/HighlighterTutorial.html
==============================================================================
--- 
websites/staging/lucy/trunk/content/docs/0.5.0/c/Lucy/Docs/Tutorial/HighlighterTutorial.html
 (added)
+++ 
websites/staging/lucy/trunk/content/docs/0.5.0/c/Lucy/Docs/Tutorial/HighlighterTutorial.html
 Wed Sep 28 12:07:48 2016
@@ -0,0 +1,160 @@
+<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
+<html lang="en">
+  <head>
+    <meta http-equiv="Content-Type" content="text/html;charset=UTF-8">
+    <title>Lucy::Docs::Tutorial::HighlighterTutorial</title>
+    <link rel="stylesheet" type="text/css" media="screen" href="/css/lucy.css">
+  </head>
+
+  <body>
+
+    <div id="lucy-rigid_wrapper">
+
+      <div id="lucy-top" class="container_16 lucy-white_box_3d">
+
+        <div id="lucy-logo_box" class="grid_8">
+          <a href="/"><img src="/images/lucy_logo_150x100.png" alt="Apache 
Lucyâ¢"></a>
+        </div> <!-- lucy-logo_box -->
+
+        <div #id="lucy-top_nav_box" class="grid_8">
+          <div id="lucy-top_nav_bar" class="container_8">
+            <ul>
+              <li><a href="http://www.apache.org/"; title="Apache Software 
Foundation">Apache Software Foundation</a></li>
+              <li><a href="http://www.apache.org/licenses/"; 
title="License">License</a></li>
+              <li><a href="http://www.apache.org/foundation/sponsorship.html"; 
title="Sponsorship">Sponsorship</a></li>
+              <li><a href="http://www.apache.org/foundation/thanks.html"; 
title="Thanks">Thanks</a></li>
+              <li><a href="http://www.apache.org/security/ " 
title="Security">Security</a></li>
+            </ul>
+          </div> <!-- lucy-top_nav_bar -->
+          <p><a href="http://www.apache.org/";>Apache</a>&nbsp;&raquo&nbsp;<a 
href="/">Lucy</a>&nbsp;&raquo&nbsp;<a 
href="/docs/">Docs</a>&nbsp;&raquo&nbsp;<a 
href="/docs/0.5.0/">0.5.0</a>&nbsp;&raquo&nbsp;<a 
href="/docs/0.5.0/c/">C</a>&nbsp;&raquo&nbsp;<a 
href="/docs/0.5.0/c/Lucy/">Lucy</a>&nbsp;&raquo&nbsp;<a 
href="/docs/0.5.0/c/Lucy/Docs/">Docs</a>&nbsp;&raquo&nbsp;<a 
href="/docs/0.5.0/c/Lucy/Docs/Tutorial/">Tutorial</a></p>
+          <form name="lucy-top_search_box" id="lucy-top_search_box" 
action="http://www.google.com/search"; method="get">
+            <input value="*.apache.org" name="sitesearch" type="hidden"/>
+            <input type="text" name="q" id="query" style="width:85%">
+            <input type="submit" id="submit" value="Search">
+          </form>
+        </div> <!-- lucy-top_nav_box -->
+
+        <div class="clear"></div>
+
+      </div> <!-- lucy-top -->
+
+      <div id="lucy-main_content" class="container_16 lucy-white_box_3d">
+
+        <div class="grid_4" id="lucy-left_nav_box">
+          <h6>About</h6>
+            <ul>
+              <li><a href="/">Welcome</a></li>
+              <li><a href="/clownfish.html">Clownfish</a></li>
+              <li><a href="/faq.html">FAQ</a></li>
+              <li><a href="/people.html">People</a></li>
+            </ul>
+          <h6>Resources</h6>
+            <ul>
+              <li><a href="/download.html">Download</a></li>
+              <li><a href="/mailing_lists.html">Mailing Lists</a></li>
+              <li><a href="/docs/">Documentation</a></li>
+              <li><a href="http://wiki.apache.org/lucy/";>Wiki</a></li>
+              <li><a href="https://issues.apache.org/jira/browse/LUCY";>Issue 
Tracker</a></li>
+              <li><a href="/version_control.html">Version Control</a></li>
+            </ul>
+          <h6>Related Projects</h6>
+            <ul>
+              <li><a href="http://lucene.apache.org/core/";>Lucene</a></li>
+              <li><a href="http://dezi.org/";>Dezi</a></li>
+              <li><a href="http://lucene.apache.org/solr/";>Solr</a></li>
+              <li><a href="http://lucenenet.apache.org/";>Lucene.NET</a></li>
+              <li><a 
href="http://lucene.apache.org/pylucene/";>PyLucene</a></li>
+            </ul>
+        </div> <!-- lucy-left_nav_box -->
+
+        <div id="lucy-main_content_box" class="grid_9">
+          <div class="c-api">
+<h2>Augment search results with highlighted excerpts.</h2>
+<p>Adding relevant excerpts with highlighted search terms to your search 
results
+display makes it much easier for end users to scan the page and assess which
+hits look promising, dramatically improving their search experience.</p>
+<h3>Adaptations to indexer.pl</h3>
+<p><a href="../../../Lucy/Highlight/Highlighter.html">Highlighter</a> uses 
information generated at index
+time.  To save resources, highlighting is disabled by default and must be
+turned on for individual fields.</p>
+<pre><code class="language-c">    {
+        String *field_str = Str_newf(&quot;content&quot;);
+        FullTextType *type = FullTextType_new((Analyzer*)analyzer);
+        FullTextType_Set_Highlightable(type, true);
+        Schema_Spec_Field(schema, field_str, (FieldType*)type);
+        DECREF(type);
+        DECREF(field_str);
+    }
+</code></pre>
+<h3>Adaptations to search.cgi</h3>
+<p>To add highlighting and excerpting to the search.cgi sample app, create a
+<code>$highlighter</code> object outside the hits iterating loopâ¦</p>
+<pre><code class="language-c">    String *content_str = 
Str_newf(&quot;content&quot;);
+    Highlighter *highlighter
+        = Highlighter_new((Searcher*)searcher, (Obj*)query,
+                          content_str, 200);
+</code></pre>
+<p>â¦ then modify the loop and the per-hit display to generate and include the
+excerpt.</p>
+<pre><code class="language-c">    String *title_str = 
Str_newf(&quot;title&quot;);
+    String *url_str   = Str_newf(&quot;url&quot;);
+    HitDoc *hit;
+    i = 1;
+
+    // Loop over search results.
+    while (NULL != (hit = Hits_Next(hits))) {
+        String *title = (String*)HitDoc_Extract(hit, title_str);
+        char *title_c = Str_To_Utf8(title);
+
+        String *url = (String*)HitDoc_Extract(hit, url_str);
+        char *url_c = Str_To_Utf8(url);
+
+        String *excerpt = Highlighter_Create_Excerpt(highlighter, hit);
+        char *excerpt_c = Str_To_Utf8(excerpt);
+
+        printf(&quot;Result %d: %s (%s)\n%s\n\n&quot;, i, title_c, url_c, 
excerpt_c);
+
+        free(excerpt_c);
+        free(url_c);
+        free(title_c);
+        DECREF(excerpt);
+        DECREF(url);
+        DECREF(title);
+        DECREF(hit);
+        i++;
+    }
+
+    DECREF(url_str);
+    DECREF(title_str);
+    DECREF(hits);
+    DECREF(query_str);
+    DECREF(highlighter);
+    DECREF(content_str);
+    DECREF(searcher);
+    DECREF(folder);
+</code></pre>
+<h3>Next chapter: Query objects</h3>
+<p>Our next tutorial chapter, <a 
href="../../../Lucy/Docs/Tutorial/QueryObjectsTutorial.html">QueryObjectsTutorial</a>,
+illustrates how to build an âadvanced searchâ interface using
+<a href="../../../Lucy/Search/Query.html">Query</a> objects instead of query 
strings.</p>
+</div>
+
+        </div> <!-- lucy-main_content_box --> 
+        <div class="clear"></div>
+
+      </div> <!-- lucy-main_content -->
+
+      <div id="lucy-copyright" class="container_16">
+        <p>Copyright &#169; 2010-2015 The Apache Software Foundation, Licensed 
under the 
+           <a href="http://www.apache.org/licenses/LICENSE-2.0";>Apache 
License, Version 2.0</a>.
+           <br/>
+           Apache Lucy, Lucy, Apache, the Apache feather logo, and the Apache 
Lucy project logo are trademarks of The
+           Apache Software Foundation.  All other marks mentioned may be 
trademarks or registered trademarks of their
+           respective owners.
+        </p>
+      </div> <!-- lucy-copyright -->
+
+    </div> <!-- lucy-rigid_wrapper -->
+
+  </body>
+</html>

Added: 
websites/staging/lucy/trunk/content/docs/0.5.0/c/Lucy/Docs/Tutorial/QueryObjectsTutorial.html
==============================================================================
--- 
websites/staging/lucy/trunk/content/docs/0.5.0/c/Lucy/Docs/Tutorial/QueryObjectsTutorial.html
 (added)
+++ 
websites/staging/lucy/trunk/content/docs/0.5.0/c/Lucy/Docs/Tutorial/QueryObjectsTutorial.html
 Wed Sep 28 12:07:48 2016
@@ -0,0 +1,269 @@
+<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
+<html lang="en">
+  <head>
+    <meta http-equiv="Content-Type" content="text/html;charset=UTF-8">
+    <title>Lucy::Docs::Tutorial::QueryObjectsTutorial</title>
+    <link rel="stylesheet" type="text/css" media="screen" href="/css/lucy.css">
+  </head>
+
+  <body>
+
+    <div id="lucy-rigid_wrapper">
+
+      <div id="lucy-top" class="container_16 lucy-white_box_3d">
+
+        <div id="lucy-logo_box" class="grid_8">
+          <a href="/"><img src="/images/lucy_logo_150x100.png" alt="Apache 
Lucyâ¢"></a>
+        </div> <!-- lucy-logo_box -->
+
+        <div #id="lucy-top_nav_box" class="grid_8">
+          <div id="lucy-top_nav_bar" class="container_8">
+            <ul>
+              <li><a href="http://www.apache.org/"; title="Apache Software 
Foundation">Apache Software Foundation</a></li>
+              <li><a href="http://www.apache.org/licenses/"; 
title="License">License</a></li>
+              <li><a href="http://www.apache.org/foundation/sponsorship.html"; 
title="Sponsorship">Sponsorship</a></li>
+              <li><a href="http://www.apache.org/foundation/thanks.html"; 
title="Thanks">Thanks</a></li>
+              <li><a href="http://www.apache.org/security/ " 
title="Security">Security</a></li>
+            </ul>
+          </div> <!-- lucy-top_nav_bar -->
+          <p><a href="http://www.apache.org/";>Apache</a>&nbsp;&raquo&nbsp;<a 
href="/">Lucy</a>&nbsp;&raquo&nbsp;<a 
href="/docs/">Docs</a>&nbsp;&raquo&nbsp;<a 
href="/docs/0.5.0/">0.5.0</a>&nbsp;&raquo&nbsp;<a 
href="/docs/0.5.0/c/">C</a>&nbsp;&raquo&nbsp;<a 
href="/docs/0.5.0/c/Lucy/">Lucy</a>&nbsp;&raquo&nbsp;<a 
href="/docs/0.5.0/c/Lucy/Docs/">Docs</a>&nbsp;&raquo&nbsp;<a 
href="/docs/0.5.0/c/Lucy/Docs/Tutorial/">Tutorial</a></p>
+          <form name="lucy-top_search_box" id="lucy-top_search_box" 
action="http://www.google.com/search"; method="get">
+            <input value="*.apache.org" name="sitesearch" type="hidden"/>
+            <input type="text" name="q" id="query" style="width:85%">
+            <input type="submit" id="submit" value="Search">
+          </form>
+        </div> <!-- lucy-top_nav_box -->
+
+        <div class="clear"></div>
+
+      </div> <!-- lucy-top -->
+
+      <div id="lucy-main_content" class="container_16 lucy-white_box_3d">
+
+        <div class="grid_4" id="lucy-left_nav_box">
+          <h6>About</h6>
+            <ul>
+              <li><a href="/">Welcome</a></li>
+              <li><a href="/clownfish.html">Clownfish</a></li>
+              <li><a href="/faq.html">FAQ</a></li>
+              <li><a href="/people.html">People</a></li>
+            </ul>
+          <h6>Resources</h6>
+            <ul>
+              <li><a href="/download.html">Download</a></li>
+              <li><a href="/mailing_lists.html">Mailing Lists</a></li>
+              <li><a href="/docs/">Documentation</a></li>
+              <li><a href="http://wiki.apache.org/lucy/";>Wiki</a></li>
+              <li><a href="https://issues.apache.org/jira/browse/LUCY";>Issue 
Tracker</a></li>
+              <li><a href="/version_control.html">Version Control</a></li>
+            </ul>
+          <h6>Related Projects</h6>
+            <ul>
+              <li><a href="http://lucene.apache.org/core/";>Lucene</a></li>
+              <li><a href="http://dezi.org/";>Dezi</a></li>
+              <li><a href="http://lucene.apache.org/solr/";>Solr</a></li>
+              <li><a href="http://lucenenet.apache.org/";>Lucene.NET</a></li>
+              <li><a 
href="http://lucene.apache.org/pylucene/";>PyLucene</a></li>
+            </ul>
+        </div> <!-- lucy-left_nav_box -->
+
+        <div id="lucy-main_content_box" class="grid_9">
+          <div class="c-api">
+<h2>Use Query objects instead of query strings.</h2>
+<p>Until now, our search app has had only a single search box.  In this 
tutorial
+chapter, weâll move towards an âadvanced searchâ interface, by adding a
+âcategoryâ drop-down menu.  Three new classes will be required:</p>
+<ul>
+<li>
+<p><a href="../../../Lucy/Search/QueryParser.html">QueryParser</a> - Turn a 
query string into a
+<a href="../../../Lucy/Search/Query.html">Query</a> object.</p>
+</li>
+<li>
+<p><a href="../../../Lucy/Search/TermQuery.html">TermQuery</a> - Query for a 
specific term within
+a specific field.</p>
+</li>
+<li>
+<p><a href="../../../Lucy/Search/ANDQuery.html">ANDQuery</a> - âANDâ 
together multiple Query
+objects to produce an intersected result set.</p>
+</li>
+</ul>
+<h3>Adaptations to indexer.pl</h3>
+<p>Our new âcategoryâ field will be a StringType field rather than a 
FullTextType
+field, because we will only be looking for exact matches.  It needs to be
+indexed, but since we wonât display its value, it doesnât need to be 
stored.</p>
+<pre><code class="language-c">    {
+        String *field_str = Str_newf(&quot;category&quot;);
+        StringType *type = StringType_new();
+        StringType_Set_Stored(type, false);
+        Schema_Spec_Field(schema, field_str, (FieldType*)type);
+        DECREF(type);
+        DECREF(field_str);
+    }
+</code></pre>
+<p>There will be three possible values: âarticleâ, âamendmentâ, and 
âpreambleâ,
+which weâll hack out of the source fileâs name during our 
<code>parse_file</code>
+subroutine:</p>
+<pre><code class="language-c">    const char *category = NULL;
+    if (S_starts_with(filename, &quot;art&quot;)) {
+        category = &quot;article&quot;;
+    }
+    else if (S_starts_with(filename, &quot;amend&quot;)) {
+        category = &quot;amendment&quot;;
+    }
+    else if (S_starts_with(filename, &quot;preamble&quot;)) {
+        category = &quot;preamble&quot;;
+    }
+    else {
+        fprintf(stderr, &quot;Can't derive category for %s&quot;, filename);
+        exit(1);
+    }
+
+    ...
+
+    {
+        // Store 'category' field
+        String *field = Str_newf(&quot;category&quot;);
+        String *value = Str_new_from_utf8(category, strlen(category));
+        Doc_Store(doc, field, (Obj*)value);
+        DECREF(field);
+        DECREF(value);
+    }
+</code></pre>
+<h3>Adaptations to search.cgi</h3>
+<p>The âcategoryâ constraint will be added to our search interface using 
an HTML
+âselectâ element (this routine will need to be integrated into the HTML
+generation section of search.cgi):</p>
+<pre><code class="language-c">static void
+S_usage_and_exit(const char *arg0) {
+    printf(&quot;Usage: %s [-c &lt;category&gt;] &lt;querystring&gt;\n&quot;, 
arg0);
+    exit(1);
+}
+</code></pre>
+<p>Weâll start off by loading our new modules and extracting our new CGI
+parameter.</p>
+<pre><code class="language-c">    const char *category = NULL;
+    int i = 1;
+
+    while (i &lt; argc - 1) {
+        if (strcmp(argv[i], &quot;-c&quot;) == 0) {
+            if (i + 1 &gt;= argc) {
+                S_usage_and_exit(argv[0]);
+            }
+            i += 1;
+            category = argv[i];
+        }
+        else {
+            S_usage_and_exit(argv[0]);
+        }
+
+        i += 1;
+    }
+
+    if (i + 1 != argc) {
+        S_usage_and_exit(argv[0]);
+    }
+
+    const char *query_c = argv[i];
+</code></pre>
+<p>QueryParserâs constructor requires a âschemaâ argument.  We can get 
that from
+our IndexSearcher:</p>
+<pre><code class="language-c">    IndexSearcher *searcher = 
IxSearcher_new((Obj*)folder);
+    Schema        *schema   = IxSearcher_Get_Schema(searcher);
+    QueryParser   *qparser  = QParser_new(schema, NULL, NULL, NULL);
+</code></pre>
+<p>Previously, we have been handing raw query strings to IndexSearcher.  Behind
+the scenes, IndexSearcher has been using a QueryParser to turn those query
+strings into Query objects.  Now, we will bring QueryParser into the
+foreground and parse the strings explicitly.</p>
+<pre><code class="language-c">    Query *query = QParser_Parse(qparser, 
query_str);
+</code></pre>
+<p>If the user has specified a category, weâll use an ANDQuery to join our 
parsed
+query together with a TermQuery representing the category.</p>
+<pre><code class="language-c">    if (category) {
+        String *category_name = String_newf(&quot;category&quot;);
+        String *category_str  = String_newf(&quot;%s&quot;, category);
+        TermQuery *category_query
+            = TermQuery_new(category_name, category_str);
+
+        Vector *children = Vec_new(2);
+        Vec_Push(children, (Obj*)query);
+        Vec_Push(children, category_query);
+        query = (Query*)ANDQuery_new(children);
+
+        DECREF(children);
+        DECREF(category_str);
+        DECREF(category_name);
+    }
+}
+</code></pre>
+<p>Now when we execute the queryâ¦</p>
+<pre><code class="language-c">    Hits *hits = IxSearcher_Hits(searcher, 
(Obj*)query, 0, 10, NULL);
+</code></pre>
+<p>â¦ weâll get a result set which is the intersection of the parsed query 
and
+the category query.</p>
+<h3>Using TermQuery with full text fields</h3>
+<p>When querying full text fields, the easiest way is to create query objects
+using QueryParser. But sometimes you want to create TermQuery for a single
+term in a FullTextType field directly. In this case, we have to run the
+search term through the fieldâs analyzer to make sure it gets normalized in
+the same way as the fieldâs content.</p>
+<pre><code class="language-c">Query*
+make_term_query(Schema *schema, String *field, String *term) {
+    FieldType *type  = Schema_Fetch_Type(schema, field);
+    String    *token = NULL;
+
+    if (FieldType_is_a(type, FULLTEXTTYPE)) {
+        // Run the term through the full text analysis chain.
+        Analyzer *analyzer = FullTextType_Get_Analyzer((FullTextType*)type);
+        Vector   *tokens   = Analyzer_Split(analyzer, term);
+
+        if (Vec_Get_Size(tokens) != 1) {
+            // If the term expands to more than one token, or no
+            // tokens at all, it will never match a single token in
+            // the full text field.
+            DECREF(tokens);
+            return (Query*)NoMatchQuery_new();
+        }
+
+        token = (String*)Vec_Delete(tokens, 0);
+        DECREF(tokens);
+    }
+    else {
+        // Exact match for other types.
+        token = (String*)INCREF(term);
+    }
+
+    TermQuery *term_query = TermQuery_new(field, (Obj*)token);
+
+    DECREF(token);
+    return (Query*)term_query;
+}
+</code></pre>
+<h3>Congratulations!</h3>
+<p>Youâve made it to the end of the tutorial.</p>
+<h3>See Also</h3>
+<p>For additional thematic documentation, see the Apache Lucy
+<a href="../../../Lucy/Docs/Cookbook.html">Cookbook</a>.</p>
+<p>ANDQuery has a companion class, <a 
href="../../../Lucy/Search/ORQuery.html">ORQuery</a>, and a
+close relative, <a 
href="../../../Lucy/Search/RequiredOptionalQuery.html">RequiredOptionalQuery</a>.</p>
+</div>
+
+        </div> <!-- lucy-main_content_box --> 
+        <div class="clear"></div>
+
+      </div> <!-- lucy-main_content -->
+
+      <div id="lucy-copyright" class="container_16">
+        <p>Copyright &#169; 2010-2015 The Apache Software Foundation, Licensed 
under the 
+           <a href="http://www.apache.org/licenses/LICENSE-2.0";>Apache 
License, Version 2.0</a>.
+           <br/>
+           Apache Lucy, Lucy, Apache, the Apache feather logo, and the Apache 
Lucy project logo are trademarks of The
+           Apache Software Foundation.  All other marks mentioned may be 
trademarks or registered trademarks of their
+           respective owners.
+        </p>
+      </div> <!-- lucy-copyright -->
+
+    </div> <!-- lucy-rigid_wrapper -->
+
+  </body>
+</html>

svn commit: r998475 [6/26] - in /websites/staging/lucy/trunk/content: ./ docs/ docs/0.5.0/ docs/0.5.0/c/ docs/0.5.0/c/Clownfish/ docs/0.5.0/c/Clownfish/Docs/ docs/0.5.0/c/Lucy/ docs/0.5.0/c/Lucy/Analysis/ docs/0.5.0/c/Lucy/Docs/ docs/0.5.0/c/Lucy/Docs/...

Reply via email to