Added: websites/staging/lucy/trunk/content/docs/c/Lucy/Docs/IRTheory.html ============================================================================== --- websites/staging/lucy/trunk/content/docs/c/Lucy/Docs/IRTheory.html (added) +++ websites/staging/lucy/trunk/content/docs/c/Lucy/Docs/IRTheory.html Mon Apr 4 12:55:27 2016 @@ -0,0 +1,133 @@ +<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> +<html lang="en"> + <head> + <meta http-equiv="Content-Type" content="text/html;charset=UTF-8"> + <title>Lucy::Docs::IRTheory</title> + <link rel="stylesheet" type="text/css" media="screen" href="/css/lucy.css"> + </head> + + <body> + + <div id="lucy-rigid_wrapper"> + + <div id="lucy-top" class="container_16 lucy-white_box_3d"> + + <div id="lucy-logo_box" class="grid_8"> + <a href="/"><img src="/images/lucy_logo_150x100.png" alt="Apache Lucyâ¢"></a> + </div> <!-- lucy-logo_box --> + + <div #id="lucy-top_nav_box" class="grid_8"> + <div id="lucy-top_nav_bar" class="container_8"> + <ul> + <li><a href="http://www.apache.org/" title="Apache Software Foundation">Apache Software Foundation</a></li> + <li><a href="http://www.apache.org/licenses/" title="License">License</a></li> + <li><a href="http://www.apache.org/foundation/sponsorship.html" title="Sponsorship">Sponsorship</a></li> + <li><a href="http://www.apache.org/foundation/thanks.html" title="Thanks">Thanks</a></li> + <li><a href="http://www.apache.org/security/ " title="Security">Security</a></li> + </ul> + </div> <!-- lucy-top_nav_bar --> + <p><a href="http://www.apache.org/">Apache</a> » <a href="/">Lucy</a> » <a href="/docs/">Docs</a> » <a href="/docs/c/">C</a> » <a href="/docs/c/Lucy/">Lucy</a> » <a href="/docs/c/Lucy/Docs/">Docs</a></p> + <form name="lucy-top_search_box" id="lucy-top_search_box" action="http://www.google.com/search" method="get"> + <input value="*.apache.org" name="sitesearch" type="hidden"/> + <input type="text" name="q" id="query" style="width:85%"> + <input type="submit" id="submit" value="Search"> + </form> + </div> <!-- lucy-top_nav_box --> + + <div class="clear"></div> + + </div> <!-- lucy-top --> + + <div id="lucy-main_content" class="container_16 lucy-white_box_3d"> + + <div class="grid_4" id="lucy-left_nav_box"> + <h6>About</h6> + <ul> + <li><a href="/">Welcome</a></li> + <li><a href="/clownfish.html">Clownfish</a></li> + <li><a href="/faq.html">FAQ</a></li> + <li><a href="/people.html">People</a></li> + </ul> + <h6>Resources</h6> + <ul> + <li><a href="/download.html">Download</a></li> + <li><a href="/mailing_lists.html">Mailing Lists</a></li> + <li><a href="/docs/">Documentation</a></li> + <li><a href="http://wiki.apache.org/lucy/">Wiki</a></li> + <li><a href="https://issues.apache.org/jira/browse/LUCY">Issue Tracker</a></li> + <li><a href="/version_control.html">Version Control</a></li> + </ul> + <h6>Related Projects</h6> + <ul> + <li><a href="http://lucene.apache.org/core/">Lucene</a></li> + <li><a href="http://dezi.org/">Dezi</a></li> + <li><a href="http://lucene.apache.org/solr/">Solr</a></li> + <li><a href="http://lucenenet.apache.org/">Lucene.NET</a></li> + <li><a href="http://lucene.apache.org/pylucene/">PyLucene</a></li> + </ul> + </div> <!-- lucy-left_nav_box --> + + <div id="lucy-main_content_box" class="grid_9"> + <div class="c-api"> +<h2>Crash course in information retrieval</h2> +<p>Just enough Information Retrieval theory to find your way around Apache Lucy.</p> +<h3>Terminology</h3> +<p>Lucy uses some terminology from the field of information retrieval which +may be unfamiliar to many users. âDocumentâ and âtermâ mean pretty much what +youâd expect them to, but others such as âpostingâ and âinverted indexâ need a +formal introduction:</p> +<ul> +<li><em>document</em> - An atomic unit of retrieval.</li> +<li><em>term</em> - An attribute which describes a document.</li> +<li><em>posting</em> - One term indexing one document.</li> +<li><em>term list</em> - The complete list of terms which describe a document.</li> +<li><em>posting list</em> - The complete list of documents which a term indexes.</li> +<li><em>inverted index</em> - A data structure which maps from terms to documents.</li> +</ul> +<p>Since Lucy is a practical implementation of IR theory, it loads these +abstract, distilled definitions down with useful traits. For instance, a +âpostingâ in its most rarefied form is simply a term-document pairing; in +Lucy, the class MatchPosting fills this +role. However, by associating additional information with a posting like the +number of times the term occurs in the document, we can turn it into a +ScorePosting, making it possible +to rank documents by relevance rather than just list documents which happen to +match in no particular order.</p> +<h3>TF/IDF ranking algorithm</h3> +<p>Lucy uses a variant of the well-established âTerm Frequency / Inverse +Document Frequencyâ weighting scheme. A thorough treatment of TF/IDF is too +ambitious for our present purposes, but in a nutshell, it means thatâ¦</p> +<ul> +<li> +<p>in a search for <code>skate park</code>, documents which score well for the +comparatively rare term <code>skate</code> will rank higher than documents which score +well for the more common term <code>park</code>.</p> +</li> +<li> +<p>a 10-word text which has one occurrence each of both <code>skate</code> and <code>park</code> will +rank higher than a 1000-word text which also contains one occurrence of each.</p> +</li> +</ul> +<p>A web search for âtf idfâ will turn up many excellent explanations of the +algorithm.</p> +</div> + + </div> <!-- lucy-main_content_box --> + <div class="clear"></div> + + </div> <!-- lucy-main_content --> + + <div id="lucy-copyright" class="container_16"> + <p>Copyright © 2010-2015 The Apache Software Foundation, Licensed under the + <a href="http://www.apache.org/licenses/LICENSE-2.0">Apache License, Version 2.0</a>. + <br/> + Apache Lucy, Lucy, Apache, the Apache feather logo, and the Apache Lucy project logo are trademarks of The + Apache Software Foundation. All other marks mentioned may be trademarks or registered trademarks of their + respective owners. + </p> + </div> <!-- lucy-copyright --> + + </div> <!-- lucy-rigid_wrapper --> + + </body> +</html>
Added: websites/staging/lucy/trunk/content/docs/c/Lucy/Docs/Tutorial.html ============================================================================== --- websites/staging/lucy/trunk/content/docs/c/Lucy/Docs/Tutorial.html (added) +++ websites/staging/lucy/trunk/content/docs/c/Lucy/Docs/Tutorial.html Mon Apr 4 12:55:27 2016 @@ -0,0 +1,142 @@ +<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> +<html lang="en"> + <head> + <meta http-equiv="Content-Type" content="text/html;charset=UTF-8"> + <title>Lucy::Docs::Tutorial</title> + <link rel="stylesheet" type="text/css" media="screen" href="/css/lucy.css"> + </head> + + <body> + + <div id="lucy-rigid_wrapper"> + + <div id="lucy-top" class="container_16 lucy-white_box_3d"> + + <div id="lucy-logo_box" class="grid_8"> + <a href="/"><img src="/images/lucy_logo_150x100.png" alt="Apache Lucyâ¢"></a> + </div> <!-- lucy-logo_box --> + + <div #id="lucy-top_nav_box" class="grid_8"> + <div id="lucy-top_nav_bar" class="container_8"> + <ul> + <li><a href="http://www.apache.org/" title="Apache Software Foundation">Apache Software Foundation</a></li> + <li><a href="http://www.apache.org/licenses/" title="License">License</a></li> + <li><a href="http://www.apache.org/foundation/sponsorship.html" title="Sponsorship">Sponsorship</a></li> + <li><a href="http://www.apache.org/foundation/thanks.html" title="Thanks">Thanks</a></li> + <li><a href="http://www.apache.org/security/ " title="Security">Security</a></li> + </ul> + </div> <!-- lucy-top_nav_bar --> + <p><a href="http://www.apache.org/">Apache</a> » <a href="/">Lucy</a> » <a href="/docs/">Docs</a> » <a href="/docs/c/">C</a> » <a href="/docs/c/Lucy/">Lucy</a> » <a href="/docs/c/Lucy/Docs/">Docs</a></p> + <form name="lucy-top_search_box" id="lucy-top_search_box" action="http://www.google.com/search" method="get"> + <input value="*.apache.org" name="sitesearch" type="hidden"/> + <input type="text" name="q" id="query" style="width:85%"> + <input type="submit" id="submit" value="Search"> + </form> + </div> <!-- lucy-top_nav_box --> + + <div class="clear"></div> + + </div> <!-- lucy-top --> + + <div id="lucy-main_content" class="container_16 lucy-white_box_3d"> + + <div class="grid_4" id="lucy-left_nav_box"> + <h6>About</h6> + <ul> + <li><a href="/">Welcome</a></li> + <li><a href="/clownfish.html">Clownfish</a></li> + <li><a href="/faq.html">FAQ</a></li> + <li><a href="/people.html">People</a></li> + </ul> + <h6>Resources</h6> + <ul> + <li><a href="/download.html">Download</a></li> + <li><a href="/mailing_lists.html">Mailing Lists</a></li> + <li><a href="/docs/">Documentation</a></li> + <li><a href="http://wiki.apache.org/lucy/">Wiki</a></li> + <li><a href="https://issues.apache.org/jira/browse/LUCY">Issue Tracker</a></li> + <li><a href="/version_control.html">Version Control</a></li> + </ul> + <h6>Related Projects</h6> + <ul> + <li><a href="http://lucene.apache.org/core/">Lucene</a></li> + <li><a href="http://dezi.org/">Dezi</a></li> + <li><a href="http://lucene.apache.org/solr/">Solr</a></li> + <li><a href="http://lucenenet.apache.org/">Lucene.NET</a></li> + <li><a href="http://lucene.apache.org/pylucene/">PyLucene</a></li> + </ul> + </div> <!-- lucy-left_nav_box --> + + <div id="lucy-main_content_box" class="grid_9"> + <div class="c-api"> +<h2>Step-by-step introduction to Apache Lucy.</h2> +<p>Explore Apache Lucyâs basic functionality by starting with a minimalist CGI +search app based on Lucy::Simple and transforming it, step by step, +into an âadvanced searchâ interface utilizing more flexible core modules like +<a href="../../Lucy/Index/Indexer.html">Indexer</a> and <a href="../../Lucy/Search/IndexSearcher.html">IndexSearcher</a>.</p> +<h3>Chapters</h3> +<ul> +<li> +<p><a href="../../Lucy/Docs/Tutorial/SimpleTutorial.html">SimpleTutorial</a> - Build a bare-bones search app using +Lucy::Simple.</p> +</li> +<li> +<p><a href="../../Lucy/Docs/Tutorial/BeyondSimpleTutorial.html">BeyondSimpleTutorial</a> - Rebuild the app using core +classes like <a href="../../Lucy/Index/Indexer.html">Indexer</a> and +<a href="../../Lucy/Search/IndexSearcher.html">IndexSearcher</a> in place of Lucy::Simple.</p> +</li> +<li> +<p><a href="../../Lucy/Docs/Tutorial/FieldTypeTutorial.html">FieldTypeTutorial</a> - Experiment with different field +characteristics using subclasses of <a href="../../Lucy/Plan/FieldType.html">FieldType</a>.</p> +</li> +<li> +<p><a href="../../Lucy/Docs/Tutorial/AnalysisTutorial.html">AnalysisTutorial</a> - Examine how the choice of +<a href="../../Lucy/Analysis/Analyzer.html">Analyzer</a> subclass affects search results.</p> +</li> +<li> +<p><a href="../../Lucy/Docs/Tutorial/HighlighterTutorial.html">HighlighterTutorial</a> - Augment search results with +highlighted excerpts.</p> +</li> +<li> +<p><a href="../../Lucy/Docs/Tutorial/QueryObjectsTutorial.html">QueryObjectsTutorial</a> - Unlock advanced search features +by using Query objects instead of query strings.</p> +</li> +</ul> +<h3>Source materials</h3> +<p>The source material used by the tutorial app â a multi-text-file presentation +of the United States constitution â can be found in the <code>sample</code> directory +at the root of the Lucy distribution, along with finished indexing and search +apps.</p> +<pre><code class="language-c">sample/indexer_simple.c # simple indexing executable +sample/search_simple.c # simple search executable +sample/indexer.c # indexing executable +sample/search.c # search executable +sample/us_constitution # corpus +</code></pre> +<h3>Conventions</h3> +<p>The user is expected to be familiar with OO Perl and basic CGI programming.</p> +<p>The code in this tutorial assumes a Unix-flavored operating system and the +Apache webserver, but will work with minor modifications on other setups.</p> +<h3>See also</h3> +<p>More advanced and esoteric subjects are covered in <a href="../../Lucy/Docs/Cookbook.html">Cookbook</a>.</p> +</div> + + </div> <!-- lucy-main_content_box --> + <div class="clear"></div> + + </div> <!-- lucy-main_content --> + + <div id="lucy-copyright" class="container_16"> + <p>Copyright © 2010-2015 The Apache Software Foundation, Licensed under the + <a href="http://www.apache.org/licenses/LICENSE-2.0">Apache License, Version 2.0</a>. + <br/> + Apache Lucy, Lucy, Apache, the Apache feather logo, and the Apache Lucy project logo are trademarks of The + Apache Software Foundation. All other marks mentioned may be trademarks or registered trademarks of their + respective owners. + </p> + </div> <!-- lucy-copyright --> + + </div> <!-- lucy-rigid_wrapper --> + + </body> +</html> Added: websites/staging/lucy/trunk/content/docs/c/Lucy/Docs/Tutorial/AnalysisTutorial.html ============================================================================== --- websites/staging/lucy/trunk/content/docs/c/Lucy/Docs/Tutorial/AnalysisTutorial.html (added) +++ websites/staging/lucy/trunk/content/docs/c/Lucy/Docs/Tutorial/AnalysisTutorial.html Mon Apr 4 12:55:27 2016 @@ -0,0 +1,152 @@ +<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> +<html lang="en"> + <head> + <meta http-equiv="Content-Type" content="text/html;charset=UTF-8"> + <title>Lucy::Docs::Tutorial::AnalysisTutorial</title> + <link rel="stylesheet" type="text/css" media="screen" href="/css/lucy.css"> + </head> + + <body> + + <div id="lucy-rigid_wrapper"> + + <div id="lucy-top" class="container_16 lucy-white_box_3d"> + + <div id="lucy-logo_box" class="grid_8"> + <a href="/"><img src="/images/lucy_logo_150x100.png" alt="Apache Lucyâ¢"></a> + </div> <!-- lucy-logo_box --> + + <div #id="lucy-top_nav_box" class="grid_8"> + <div id="lucy-top_nav_bar" class="container_8"> + <ul> + <li><a href="http://www.apache.org/" title="Apache Software Foundation">Apache Software Foundation</a></li> + <li><a href="http://www.apache.org/licenses/" title="License">License</a></li> + <li><a href="http://www.apache.org/foundation/sponsorship.html" title="Sponsorship">Sponsorship</a></li> + <li><a href="http://www.apache.org/foundation/thanks.html" title="Thanks">Thanks</a></li> + <li><a href="http://www.apache.org/security/ " title="Security">Security</a></li> + </ul> + </div> <!-- lucy-top_nav_bar --> + <p><a href="http://www.apache.org/">Apache</a> » <a href="/">Lucy</a> » <a href="/docs/">Docs</a> » <a href="/docs/c/">C</a> » <a href="/docs/c/Lucy/">Lucy</a> » <a href="/docs/c/Lucy/Docs/">Docs</a> » <a href="/docs/c/Lucy/Docs/Tutorial/">Tutorial</a></p> + <form name="lucy-top_search_box" id="lucy-top_search_box" action="http://www.google.com/search" method="get"> + <input value="*.apache.org" name="sitesearch" type="hidden"/> + <input type="text" name="q" id="query" style="width:85%"> + <input type="submit" id="submit" value="Search"> + </form> + </div> <!-- lucy-top_nav_box --> + + <div class="clear"></div> + + </div> <!-- lucy-top --> + + <div id="lucy-main_content" class="container_16 lucy-white_box_3d"> + + <div class="grid_4" id="lucy-left_nav_box"> + <h6>About</h6> + <ul> + <li><a href="/">Welcome</a></li> + <li><a href="/clownfish.html">Clownfish</a></li> + <li><a href="/faq.html">FAQ</a></li> + <li><a href="/people.html">People</a></li> + </ul> + <h6>Resources</h6> + <ul> + <li><a href="/download.html">Download</a></li> + <li><a href="/mailing_lists.html">Mailing Lists</a></li> + <li><a href="/docs/">Documentation</a></li> + <li><a href="http://wiki.apache.org/lucy/">Wiki</a></li> + <li><a href="https://issues.apache.org/jira/browse/LUCY">Issue Tracker</a></li> + <li><a href="/version_control.html">Version Control</a></li> + </ul> + <h6>Related Projects</h6> + <ul> + <li><a href="http://lucene.apache.org/core/">Lucene</a></li> + <li><a href="http://dezi.org/">Dezi</a></li> + <li><a href="http://lucene.apache.org/solr/">Solr</a></li> + <li><a href="http://lucenenet.apache.org/">Lucene.NET</a></li> + <li><a href="http://lucene.apache.org/pylucene/">PyLucene</a></li> + </ul> + </div> <!-- lucy-left_nav_box --> + + <div id="lucy-main_content_box" class="grid_9"> + <div class="c-api"> +<h2>How to choose and use Analyzers.</h2> +<p>Try swapping out the EasyAnalyzer in our Schema for a +<a href="../../../Lucy/Analysis/StandardTokenizer.html">StandardTokenizer</a>:</p> +<pre><code class="language-c"> StandardTokenizer *tokenizer = StandardTokenizer_new(); + FullTextType *type = FullTextType_new((Analyzer*)tokenizer); +</code></pre> +<p>Search for <code>senate</code>, <code>Senate</code>, and <code>Senator</code> before and after making the +change and re-indexing.</p> +<p>Under EasyAnalyzer, the results are identical for all three searches, but +under StandardTokenizer, searches are case-sensitive, and the result sets for +<code>Senate</code> and <code>Senator</code> are distinct.</p> +<h3>EasyAnalyzer</h3> +<p>Whatâs happening is that <a href="../../../Lucy/Analysis/EasyAnalyzer.html">EasyAnalyzer</a> is performing more aggressive +processing than StandardTokenizer. In addition to tokenizing, itâs also +converting all text to lower case so that searches are case-insensitive, and +using a âstemmingâ algorithm to reduce related words to a common stem (<code>senat</code>, +in this case).</p> +<p>EasyAnalyzer is actually multiple Analyzers wrapped up in a single package. +In this case, itâs three-in-one, since specifying a EasyAnalyzer with +<code>language => 'en'</code> is equivalent to this snippet creating a +<a href="../../../Lucy/Analysis/PolyAnalyzer.html">PolyAnalyzer</a>:</p> +<pre><code class="language-c"> Vector *analyzers = Vec_new(3); + Vec_Push(analyzers, (Analyzer*)StandardTokenizer_new()); + Vec_Push(analyzers, (Analyzer*)Normalizer_new(NULL, true, false)); + Vec_Push(analyzers, (Analyzer*)SnowStemmer_new(language)); + + PolyAnalyzer *analyzer = PolyAnalyzer_new(NULL, analyzers); + DECREC(analyzers); +</code></pre> +<p>You can add or subtract Analyzers from there if you like. Try adding a fourth +Analyzer, a SnowballStopFilter for suppressing âstopwordsâ like âtheâ, âifâ, +and âmaybeâ.</p> +<pre><code class="language-c"> Vec_Push(analyzers, (Analyzer*)StandardTokenizer_new()); + Vec_Push(analyzers, (Analyzer*)Normalizer_new(NULL, true, false)); + Vec_Push(analyzers, (Analyzer*)SnowStemmer_new(language)); + Vec_Push(analyzers, (Analyzer*)SnowStop_new(language, NULL)); +</code></pre> +<p>Also, try removing the SnowballStemmer.</p> +<pre><code class="language-c"> Vec_Push(analyzers, (Analyzer*)StandardTokenizer_new()); + Vec_Push(analyzers, (Analyzer*)Normalizer_new(NULL, true, false)); +</code></pre> +<p>The original choice of a stock English EasyAnalyzer probably still yields the +best results for this document collection, but you get the idea: sometimes you +want a different Analyzer.</p> +<h3>When the best Analyzer is no Analyzer</h3> +<p>Sometimes you donât want an Analyzer at all. That was true for our âurlâ +field because we didnât need it to be searchable, but itâs also true for +certain types of searchable fields. For instance, âcategoryâ fields are often +set up to match exactly or not at all, as are fields like âlast_nameâ (because +you may not want to conflate results for âHumphreyâ and âHumphriesâ).</p> +<p>To specify that there should be no analysis performed at all, use StringType:</p> +<pre><code class="language-c"> String *name = Str_newf("category"); + StringType *type = StringType_new(); + Schema_Spec_Field(schema, name, (FieldType*)type); + DECREF(type); + DECREF(name); +</code></pre> +<h3>Highlighting up next</h3> +<p>In our next tutorial chapter, <a href="../../../Lucy/Docs/Tutorial/HighlighterTutorial.html">HighlighterTutorial</a>, +weâll add highlighted excerpts from the âcontentâ field to our search results.</p> +</div> + + </div> <!-- lucy-main_content_box --> + <div class="clear"></div> + + </div> <!-- lucy-main_content --> + + <div id="lucy-copyright" class="container_16"> + <p>Copyright © 2010-2015 The Apache Software Foundation, Licensed under the + <a href="http://www.apache.org/licenses/LICENSE-2.0">Apache License, Version 2.0</a>. + <br/> + Apache Lucy, Lucy, Apache, the Apache feather logo, and the Apache Lucy project logo are trademarks of The + Apache Software Foundation. All other marks mentioned may be trademarks or registered trademarks of their + respective owners. + </p> + </div> <!-- lucy-copyright --> + + </div> <!-- lucy-rigid_wrapper --> + + </body> +</html> Added: websites/staging/lucy/trunk/content/docs/c/Lucy/Docs/Tutorial/BeyondSimpleTutorial.html ============================================================================== --- websites/staging/lucy/trunk/content/docs/c/Lucy/Docs/Tutorial/BeyondSimpleTutorial.html (added) +++ websites/staging/lucy/trunk/content/docs/c/Lucy/Docs/Tutorial/BeyondSimpleTutorial.html Mon Apr 4 12:55:27 2016 @@ -0,0 +1,296 @@ +<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> +<html lang="en"> + <head> + <meta http-equiv="Content-Type" content="text/html;charset=UTF-8"> + <title>Lucy::Docs::Tutorial::BeyondSimpleTutorial</title> + <link rel="stylesheet" type="text/css" media="screen" href="/css/lucy.css"> + </head> + + <body> + + <div id="lucy-rigid_wrapper"> + + <div id="lucy-top" class="container_16 lucy-white_box_3d"> + + <div id="lucy-logo_box" class="grid_8"> + <a href="/"><img src="/images/lucy_logo_150x100.png" alt="Apache Lucyâ¢"></a> + </div> <!-- lucy-logo_box --> + + <div #id="lucy-top_nav_box" class="grid_8"> + <div id="lucy-top_nav_bar" class="container_8"> + <ul> + <li><a href="http://www.apache.org/" title="Apache Software Foundation">Apache Software Foundation</a></li> + <li><a href="http://www.apache.org/licenses/" title="License">License</a></li> + <li><a href="http://www.apache.org/foundation/sponsorship.html" title="Sponsorship">Sponsorship</a></li> + <li><a href="http://www.apache.org/foundation/thanks.html" title="Thanks">Thanks</a></li> + <li><a href="http://www.apache.org/security/ " title="Security">Security</a></li> + </ul> + </div> <!-- lucy-top_nav_bar --> + <p><a href="http://www.apache.org/">Apache</a> » <a href="/">Lucy</a> » <a href="/docs/">Docs</a> » <a href="/docs/c/">C</a> » <a href="/docs/c/Lucy/">Lucy</a> » <a href="/docs/c/Lucy/Docs/">Docs</a> » <a href="/docs/c/Lucy/Docs/Tutorial/">Tutorial</a></p> + <form name="lucy-top_search_box" id="lucy-top_search_box" action="http://www.google.com/search" method="get"> + <input value="*.apache.org" name="sitesearch" type="hidden"/> + <input type="text" name="q" id="query" style="width:85%"> + <input type="submit" id="submit" value="Search"> + </form> + </div> <!-- lucy-top_nav_box --> + + <div class="clear"></div> + + </div> <!-- lucy-top --> + + <div id="lucy-main_content" class="container_16 lucy-white_box_3d"> + + <div class="grid_4" id="lucy-left_nav_box"> + <h6>About</h6> + <ul> + <li><a href="/">Welcome</a></li> + <li><a href="/clownfish.html">Clownfish</a></li> + <li><a href="/faq.html">FAQ</a></li> + <li><a href="/people.html">People</a></li> + </ul> + <h6>Resources</h6> + <ul> + <li><a href="/download.html">Download</a></li> + <li><a href="/mailing_lists.html">Mailing Lists</a></li> + <li><a href="/docs/">Documentation</a></li> + <li><a href="http://wiki.apache.org/lucy/">Wiki</a></li> + <li><a href="https://issues.apache.org/jira/browse/LUCY">Issue Tracker</a></li> + <li><a href="/version_control.html">Version Control</a></li> + </ul> + <h6>Related Projects</h6> + <ul> + <li><a href="http://lucene.apache.org/core/">Lucene</a></li> + <li><a href="http://dezi.org/">Dezi</a></li> + <li><a href="http://lucene.apache.org/solr/">Solr</a></li> + <li><a href="http://lucenenet.apache.org/">Lucene.NET</a></li> + <li><a href="http://lucene.apache.org/pylucene/">PyLucene</a></li> + </ul> + </div> <!-- lucy-left_nav_box --> + + <div id="lucy-main_content_box" class="grid_9"> + <div class="c-api"> +<h2>A more flexible app structure.</h2> +<h3>Goal</h3> +<p>In this tutorial chapter, weâll refactor the apps we built in +<a href="../../../Lucy/Docs/Tutorial/SimpleTutorial.html">SimpleTutorial</a> so that they look exactly the same from +the end userâs point of view, but offer the developer greater possibilites for +expansion.</p> +<p>To achieve this, weâll ditch Lucy::Simple and replace it with the +classes that it uses internally:</p> +<ul> +<li><a href="../../../Lucy/Plan/Schema.html">Schema</a> - Plan out your index.</li> +<li><a href="../../../Lucy/Plan/FullTextType.html">FullTextType</a> - Field type for full text search.</li> +<li><a href="../../../Lucy/Analysis/EasyAnalyzer.html">EasyAnalyzer</a> - A one-size-fits-all parser/tokenizer.</li> +<li><a href="../../../Lucy/Index/Indexer.html">Indexer</a> - Manipulate index content.</li> +<li><a href="../../../Lucy/Search/IndexSearcher.html">IndexSearcher</a> - Search an index.</li> +<li><a href="../../../Lucy/Search/Hits.html">Hits</a> - Iterate over hits returned by a Searcher.</li> +</ul> +<h3>Adaptations to indexer.pl</h3> +<p>After we load our modulesâ¦</p> +<pre><code class="language-c">#include <dirent.h> +#include <stdio.h> +#include <stdlib.h> +#include <string.h> + +#define CFISH_USE_SHORT_NAMES +#define LUCY_USE_SHORT_NAMES +#include "Clownfish/String.h" +#include "Lucy/Analysis/EasyAnalyzer.h" +#include "Lucy/Document/Doc.h" +#include "Lucy/Index/Indexer.h" +#include "Lucy/Plan/FullTextType.h" +#include "Lucy/Plan/StringType.h" +#include "Lucy/Plan/Schema.h" + +const char path_to_index[] = "/path/to/index"; +const char uscon_source[] = "/usr/local/apache2/htdocs/us_constitution"; +</code></pre> +<p>⦠the first item weâre going need is a <a href="../../../Lucy/Plan/Schema.html">Schema</a>.</p> +<p>The primary job of a Schema is to specify what fields are available and how +theyâre defined. Weâll start off with three fields: title, content and url.</p> +<pre><code class="language-c">static Schema* +S_create_schema() { + // Create a new schema. + Schema *schema = Schema_new(); + + // Create an analyzer. + String *language = Str_newf("en"); + EasyAnalyzer *analyzer = EasyAnalyzer_new(language); + + // Specify fields. + + FullTextType *type = FullTextType_new((Analyzer*)analyzer); + + { + String *field_str = Str_newf("title"); + Schema_Spec_Field(schema, field_str, (FieldType*)type); + DECREF(field_str); + } + + { + String *field_str = Str_newf("content"); + Schema_Spec_Field(schema, field_str, (FieldType*)type); + DECREF(field_str); + } + + { + String *field_str = Str_newf("url"); + Schema_Spec_Field(schema, field_str, (FieldType*)type); + DECREF(field_str); + } + + DECREF(type); + DECREF(analyzer); + DECREF(language); + return schema; +} +</code></pre> +<p>All of the fields are specâd out using the <a href="../../../Lucy/Plan/FullTextType.html">FullTextType</a> FieldType, +indicating that they will be searchable as âfull textâ â which means that +they can be searched for individual words. The âanalyzerâ, which is unique to +FullTextType fields, is what breaks up the text into searchable tokens.</p> +<p>Next, weâll swap our Lucy::Simple object out for an <a href="../../../Lucy/Index/Indexer.html">Indexer</a>. +The substitution will be straightforward because Simple has merely been +serving as a thin wrapper around an inner Indexer, and weâll just be peeling +away the wrapper.</p> +<p>First, replace the constructor:</p> +<pre><code class="language-c">int +main() { + // Initialize the library. + lucy_bootstrap_parcel(); + + Schema *schema = S_create_schema(); + String *folder = Str_newf("%s", path_to_index); + + Indexer *indexer = Indexer_new(schema, (Obj*)folder, NULL, + Indexer_CREATE | Indexer_TRUNCATE); + +</code></pre> +<p>Next, have the <code>indexer</code> object <a href="../../../Lucy/Index/Indexer.html#func_Add_Doc">Add_Doc()</a> where we +were having the <code>lucy</code> object adding the document before:</p> +<pre><code class="language-c"> DIR *dir = opendir(uscon_source); + if (dir == NULL) { + perror(uscon_source); + return 1; + } + + for (struct dirent *entry = readdir(dir); + entry; + entry = readdir(dir)) { + + if (S_ends_with(entry->d_name, ".txt")) { + Doc *doc = S_parse_file(entry->d_name); + Indexer_Add_Doc(indexer, doc, 1.0); + DECREF(doc); + } + } + + closedir(dir); +</code></pre> +<p>Thereâs only one extra step required: at the end of the app, you must call +commit() explicitly to close the indexing session and commit your changes. +(Lucy::Simple hides this detail, calling commit() implicitly when it needs to).</p> +<pre><code class="language-c"> Indexer_Commit(indexer); + + DECREF(indexer); + DECREF(folder); + DECREF(schema); + return 0; +} +</code></pre> +<h3>Adaptations to search.cgi</h3> +<p>In our search app as in our indexing app, Lucy::Simple has served as a +thin wrapper â this time around <a href="../../../Lucy/Search/IndexSearcher.html">IndexSearcher</a> and +<a href="../../../Lucy/Search/Hits.html">Hits</a>. Swapping out Simple for these two classes is +also straightforward:</p> +<pre><code class="language-c">#include <stdio.h> +#include <stdlib.h> +#include <string.h> + +#define CFISH_USE_SHORT_NAMES +#define LUCY_USE_SHORT_NAMES +#include "Clownfish/String.h" +#include "Lucy/Document/HitDoc.h" +#include "Lucy/Search/Hits.h" +#include "Lucy/Search/IndexSearcher.h" + +const char path_to_index[] = "/path/to/index"; + +int +main(int argc, char *argv[]) { + // Initialize the library. + lucy_bootstrap_parcel(); + + if (argc < 2) { + printf("Usage: %s <querystring>\n", argv[0]); + return 0; + } + + const char *query_c = argv[1]; + + printf("Searching for: %s\n\n", query_c); + + String *folder = Str_newf("%s", path_to_index); + IndexSearcher *searcher = IxSearcher_new((Obj*)folder); + + String *query_str = Str_newf("%s", query_c); + Hits *hits = IxSearcher_Hits(searcher, (Obj*)query_str, 0, 10, NULL); + + String *title_str = Str_newf("title"); + String *url_str = Str_newf("url"); + HitDoc *hit; + int i = 1; + + // Loop over search results. + while (NULL != (hit = Hits_Next(hits))) { + String *title = (String*)HitDoc_Extract(hit, title_str); + char *title_c = Str_To_Utf8(title); + + String *url = (String*)HitDoc_Extract(hit, url_str); + char *url_c = Str_To_Utf8(url); + + printf("Result %d: %s (%s)\n", i, title_c, url_c); + + free(url_c); + free(title_c); + DECREF(url); + DECREF(title); + DECREF(hit); + i++; + } + + DECREF(url_str); + DECREF(title_str); + DECREF(hits); + DECREF(query_str); + DECREF(searcher); + DECREF(folder); + return 0; +} +</code></pre> +<h3>Hooray!</h3> +<p>Congratulations! Your apps do the same thing as before⦠but now theyâll be +easier to customize.</p> +<p>In our next chapter, <a href="../../../Lucy/Docs/Tutorial/FieldTypeTutorial.html">FieldTypeTutorial</a>, weâll explore +how to assign different behaviors to different fields.</p> +</div> + + </div> <!-- lucy-main_content_box --> + <div class="clear"></div> + + </div> <!-- lucy-main_content --> + + <div id="lucy-copyright" class="container_16"> + <p>Copyright © 2010-2015 The Apache Software Foundation, Licensed under the + <a href="http://www.apache.org/licenses/LICENSE-2.0">Apache License, Version 2.0</a>. + <br/> + Apache Lucy, Lucy, Apache, the Apache feather logo, and the Apache Lucy project logo are trademarks of The + Apache Software Foundation. All other marks mentioned may be trademarks or registered trademarks of their + respective owners. + </p> + </div> <!-- lucy-copyright --> + + </div> <!-- lucy-rigid_wrapper --> + + </body> +</html> Added: websites/staging/lucy/trunk/content/docs/c/Lucy/Docs/Tutorial/FieldTypeTutorial.html ============================================================================== --- websites/staging/lucy/trunk/content/docs/c/Lucy/Docs/Tutorial/FieldTypeTutorial.html (added) +++ websites/staging/lucy/trunk/content/docs/c/Lucy/Docs/Tutorial/FieldTypeTutorial.html Mon Apr 4 12:55:27 2016 @@ -0,0 +1,151 @@ +<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> +<html lang="en"> + <head> + <meta http-equiv="Content-Type" content="text/html;charset=UTF-8"> + <title>Lucy::Docs::Tutorial::FieldTypeTutorial</title> + <link rel="stylesheet" type="text/css" media="screen" href="/css/lucy.css"> + </head> + + <body> + + <div id="lucy-rigid_wrapper"> + + <div id="lucy-top" class="container_16 lucy-white_box_3d"> + + <div id="lucy-logo_box" class="grid_8"> + <a href="/"><img src="/images/lucy_logo_150x100.png" alt="Apache Lucyâ¢"></a> + </div> <!-- lucy-logo_box --> + + <div #id="lucy-top_nav_box" class="grid_8"> + <div id="lucy-top_nav_bar" class="container_8"> + <ul> + <li><a href="http://www.apache.org/" title="Apache Software Foundation">Apache Software Foundation</a></li> + <li><a href="http://www.apache.org/licenses/" title="License">License</a></li> + <li><a href="http://www.apache.org/foundation/sponsorship.html" title="Sponsorship">Sponsorship</a></li> + <li><a href="http://www.apache.org/foundation/thanks.html" title="Thanks">Thanks</a></li> + <li><a href="http://www.apache.org/security/ " title="Security">Security</a></li> + </ul> + </div> <!-- lucy-top_nav_bar --> + <p><a href="http://www.apache.org/">Apache</a> » <a href="/">Lucy</a> » <a href="/docs/">Docs</a> » <a href="/docs/c/">C</a> » <a href="/docs/c/Lucy/">Lucy</a> » <a href="/docs/c/Lucy/Docs/">Docs</a> » <a href="/docs/c/Lucy/Docs/Tutorial/">Tutorial</a></p> + <form name="lucy-top_search_box" id="lucy-top_search_box" action="http://www.google.com/search" method="get"> + <input value="*.apache.org" name="sitesearch" type="hidden"/> + <input type="text" name="q" id="query" style="width:85%"> + <input type="submit" id="submit" value="Search"> + </form> + </div> <!-- lucy-top_nav_box --> + + <div class="clear"></div> + + </div> <!-- lucy-top --> + + <div id="lucy-main_content" class="container_16 lucy-white_box_3d"> + + <div class="grid_4" id="lucy-left_nav_box"> + <h6>About</h6> + <ul> + <li><a href="/">Welcome</a></li> + <li><a href="/clownfish.html">Clownfish</a></li> + <li><a href="/faq.html">FAQ</a></li> + <li><a href="/people.html">People</a></li> + </ul> + <h6>Resources</h6> + <ul> + <li><a href="/download.html">Download</a></li> + <li><a href="/mailing_lists.html">Mailing Lists</a></li> + <li><a href="/docs/">Documentation</a></li> + <li><a href="http://wiki.apache.org/lucy/">Wiki</a></li> + <li><a href="https://issues.apache.org/jira/browse/LUCY">Issue Tracker</a></li> + <li><a href="/version_control.html">Version Control</a></li> + </ul> + <h6>Related Projects</h6> + <ul> + <li><a href="http://lucene.apache.org/core/">Lucene</a></li> + <li><a href="http://dezi.org/">Dezi</a></li> + <li><a href="http://lucene.apache.org/solr/">Solr</a></li> + <li><a href="http://lucenenet.apache.org/">Lucene.NET</a></li> + <li><a href="http://lucene.apache.org/pylucene/">PyLucene</a></li> + </ul> + </div> <!-- lucy-left_nav_box --> + + <div id="lucy-main_content_box" class="grid_9"> + <div class="c-api"> +<h2>Specify per-field properties and behaviors.</h2> +<p>The Schema we used in the last chapter specifies three fields:</p> +<pre><code class="language-c"> FullTextType *type = FullTextType_new((Analyzer*)analyzer); + + { + String *field_str = Str_newf("title"); + Schema_Spec_Field(schema, field_str, (FieldType*)type); + DECREF(field_str); + } + + { + String *field_str = Str_newf("content"); + Schema_Spec_Field(schema, field_str, (FieldType*)type); + DECREF(field_str); + } + + { + String *field_str = Str_newf("url"); + Schema_Spec_Field(schema, field_str, (FieldType*)type); + DECREF(field_str); + } + +</code></pre> +<p>Since they are all defined as âfull textâ fields, they are all searchable â +including the <code>url</code> field, a dubious choice. Some URLs contain meaningful +information, but these donât, really:</p> +<pre><code>http://example.com/us_constitution/amend1.txt +</code></pre> +<p>We may as well not bother indexing the URL content. To achieve that we need +to assign the <code>url</code> field to a different FieldType.</p> +<h3>StringType</h3> +<p>Instead of FullTextType, weâll use a +<a href="../../../Lucy/Plan/StringType.html">StringType</a>, which doesnât use an +Analyzer to break up text into individual fields. Furthermore, weâll mark +this StringType as unindexed, so that its content wonât be searchable at all.</p> +<pre><code class="language-c"> { + String *field_str = Str_newf("url"); + StringType *type = StringType_new(); + StringType_Set_Indexed(type, false); + Schema_Spec_Field(schema, field_str, (FieldType*)type); + DECREF(type); + DECREF(field_str); + } +</code></pre> +<p>To observe the change in behavior, try searching for <code>us_constitution</code> both +before and after changing the Schema and re-indexing.</p> +<h3>Toggling âstoredâ</h3> +<p>For a taste of other FieldType possibilities, try turning off <code>stored</code> for +one or more fields.</p> +<pre><code class="language-c"> FullTextType *content_type = FullTextType_new((Analyzer*)analyzer); + FullTextType_Set_Stored(content_type, false); +</code></pre> +<p>Turning off <code>stored</code> for either <code>title</code> or <code>url</code> mangles our results page, +but since weâre not displaying <code>content</code>, turning it off for <code>content</code> has +no effect â except on index size.</p> +<h3>Analyzers up next</h3> +<p>Analyzers play a crucial role in the behavior of FullTextType fields. In our +next tutorial chapter, <a href="../../../Lucy/Docs/Tutorial/AnalysisTutorial.html">AnalysisTutorial</a>, weâll see how +changing up the Analyzer changes search results.</p> +</div> + + </div> <!-- lucy-main_content_box --> + <div class="clear"></div> + + </div> <!-- lucy-main_content --> + + <div id="lucy-copyright" class="container_16"> + <p>Copyright © 2010-2015 The Apache Software Foundation, Licensed under the + <a href="http://www.apache.org/licenses/LICENSE-2.0">Apache License, Version 2.0</a>. + <br/> + Apache Lucy, Lucy, Apache, the Apache feather logo, and the Apache Lucy project logo are trademarks of The + Apache Software Foundation. All other marks mentioned may be trademarks or registered trademarks of their + respective owners. + </p> + </div> <!-- lucy-copyright --> + + </div> <!-- lucy-rigid_wrapper --> + + </body> +</html> Added: websites/staging/lucy/trunk/content/docs/c/Lucy/Docs/Tutorial/HighlighterTutorial.html ============================================================================== --- websites/staging/lucy/trunk/content/docs/c/Lucy/Docs/Tutorial/HighlighterTutorial.html (added) +++ websites/staging/lucy/trunk/content/docs/c/Lucy/Docs/Tutorial/HighlighterTutorial.html Mon Apr 4 12:55:27 2016 @@ -0,0 +1,160 @@ +<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> +<html lang="en"> + <head> + <meta http-equiv="Content-Type" content="text/html;charset=UTF-8"> + <title>Lucy::Docs::Tutorial::HighlighterTutorial</title> + <link rel="stylesheet" type="text/css" media="screen" href="/css/lucy.css"> + </head> + + <body> + + <div id="lucy-rigid_wrapper"> + + <div id="lucy-top" class="container_16 lucy-white_box_3d"> + + <div id="lucy-logo_box" class="grid_8"> + <a href="/"><img src="/images/lucy_logo_150x100.png" alt="Apache Lucyâ¢"></a> + </div> <!-- lucy-logo_box --> + + <div #id="lucy-top_nav_box" class="grid_8"> + <div id="lucy-top_nav_bar" class="container_8"> + <ul> + <li><a href="http://www.apache.org/" title="Apache Software Foundation">Apache Software Foundation</a></li> + <li><a href="http://www.apache.org/licenses/" title="License">License</a></li> + <li><a href="http://www.apache.org/foundation/sponsorship.html" title="Sponsorship">Sponsorship</a></li> + <li><a href="http://www.apache.org/foundation/thanks.html" title="Thanks">Thanks</a></li> + <li><a href="http://www.apache.org/security/ " title="Security">Security</a></li> + </ul> + </div> <!-- lucy-top_nav_bar --> + <p><a href="http://www.apache.org/">Apache</a> » <a href="/">Lucy</a> » <a href="/docs/">Docs</a> » <a href="/docs/c/">C</a> » <a href="/docs/c/Lucy/">Lucy</a> » <a href="/docs/c/Lucy/Docs/">Docs</a> » <a href="/docs/c/Lucy/Docs/Tutorial/">Tutorial</a></p> + <form name="lucy-top_search_box" id="lucy-top_search_box" action="http://www.google.com/search" method="get"> + <input value="*.apache.org" name="sitesearch" type="hidden"/> + <input type="text" name="q" id="query" style="width:85%"> + <input type="submit" id="submit" value="Search"> + </form> + </div> <!-- lucy-top_nav_box --> + + <div class="clear"></div> + + </div> <!-- lucy-top --> + + <div id="lucy-main_content" class="container_16 lucy-white_box_3d"> + + <div class="grid_4" id="lucy-left_nav_box"> + <h6>About</h6> + <ul> + <li><a href="/">Welcome</a></li> + <li><a href="/clownfish.html">Clownfish</a></li> + <li><a href="/faq.html">FAQ</a></li> + <li><a href="/people.html">People</a></li> + </ul> + <h6>Resources</h6> + <ul> + <li><a href="/download.html">Download</a></li> + <li><a href="/mailing_lists.html">Mailing Lists</a></li> + <li><a href="/docs/">Documentation</a></li> + <li><a href="http://wiki.apache.org/lucy/">Wiki</a></li> + <li><a href="https://issues.apache.org/jira/browse/LUCY">Issue Tracker</a></li> + <li><a href="/version_control.html">Version Control</a></li> + </ul> + <h6>Related Projects</h6> + <ul> + <li><a href="http://lucene.apache.org/core/">Lucene</a></li> + <li><a href="http://dezi.org/">Dezi</a></li> + <li><a href="http://lucene.apache.org/solr/">Solr</a></li> + <li><a href="http://lucenenet.apache.org/">Lucene.NET</a></li> + <li><a href="http://lucene.apache.org/pylucene/">PyLucene</a></li> + </ul> + </div> <!-- lucy-left_nav_box --> + + <div id="lucy-main_content_box" class="grid_9"> + <div class="c-api"> +<h2>Augment search results with highlighted excerpts.</h2> +<p>Adding relevant excerpts with highlighted search terms to your search results +display makes it much easier for end users to scan the page and assess which +hits look promising, dramatically improving their search experience.</p> +<h3>Adaptations to indexer.pl</h3> +<p><a href="../../../Lucy/Highlight/Highlighter.html">Highlighter</a> uses information generated at index +time. To save resources, highlighting is disabled by default and must be +turned on for individual fields.</p> +<pre><code class="language-c"> { + String *field_str = Str_newf("content"); + FullTextType *type = FullTextType_new((Analyzer*)analyzer); + FullTextType_Set_Highlightable(type, true); + Schema_Spec_Field(schema, field_str, (FieldType*)type); + DECREF(type); + DECREF(field_str); + } +</code></pre> +<h3>Adaptations to search.cgi</h3> +<p>To add highlighting and excerpting to the search.cgi sample app, create a +<code>$highlighter</code> object outside the hits iterating loopâ¦</p> +<pre><code class="language-c"> String *content_str = Str_newf("content"); + Highlighter *highlighter + = Highlighter_new((Searcher*)searcher, (Obj*)query, + content_str, 200); +</code></pre> +<p>⦠then modify the loop and the per-hit display to generate and include the +excerpt.</p> +<pre><code class="language-c"> String *title_str = Str_newf("title"); + String *url_str = Str_newf("url"); + HitDoc *hit; + i = 1; + + // Loop over search results. + while (NULL != (hit = Hits_Next(hits))) { + String *title = (String*)HitDoc_Extract(hit, title_str); + char *title_c = Str_To_Utf8(title); + + String *url = (String*)HitDoc_Extract(hit, url_str); + char *url_c = Str_To_Utf8(url); + + String *excerpt = Highlighter_Create_Excerpt(highlighter, hit); + char *excerpt_c = Str_To_Utf8(excerpt); + + printf("Result %d: %s (%s)\n%s\n\n", i, title_c, url_c, excerpt_c); + + free(excerpt_c); + free(url_c); + free(title_c); + DECREF(excerpt); + DECREF(url); + DECREF(title); + DECREF(hit); + i++; + } + + DECREF(url_str); + DECREF(title_str); + DECREF(hits); + DECREF(query_str); + DECREF(highlighter); + DECREF(content_str); + DECREF(searcher); + DECREF(folder); +</code></pre> +<h3>Next chapter: Query objects</h3> +<p>Our next tutorial chapter, <a href="../../../Lucy/Docs/Tutorial/QueryObjectsTutorial.html">QueryObjectsTutorial</a>, +illustrates how to build an âadvanced searchâ interface using +<a href="../../../Lucy/Search/Query.html">Query</a> objects instead of query strings.</p> +</div> + + </div> <!-- lucy-main_content_box --> + <div class="clear"></div> + + </div> <!-- lucy-main_content --> + + <div id="lucy-copyright" class="container_16"> + <p>Copyright © 2010-2015 The Apache Software Foundation, Licensed under the + <a href="http://www.apache.org/licenses/LICENSE-2.0">Apache License, Version 2.0</a>. + <br/> + Apache Lucy, Lucy, Apache, the Apache feather logo, and the Apache Lucy project logo are trademarks of The + Apache Software Foundation. All other marks mentioned may be trademarks or registered trademarks of their + respective owners. + </p> + </div> <!-- lucy-copyright --> + + </div> <!-- lucy-rigid_wrapper --> + + </body> +</html> Added: websites/staging/lucy/trunk/content/docs/c/Lucy/Docs/Tutorial/QueryObjectsTutorial.html ============================================================================== --- websites/staging/lucy/trunk/content/docs/c/Lucy/Docs/Tutorial/QueryObjectsTutorial.html (added) +++ websites/staging/lucy/trunk/content/docs/c/Lucy/Docs/Tutorial/QueryObjectsTutorial.html Mon Apr 4 12:55:27 2016 @@ -0,0 +1,269 @@ +<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> +<html lang="en"> + <head> + <meta http-equiv="Content-Type" content="text/html;charset=UTF-8"> + <title>Lucy::Docs::Tutorial::QueryObjectsTutorial</title> + <link rel="stylesheet" type="text/css" media="screen" href="/css/lucy.css"> + </head> + + <body> + + <div id="lucy-rigid_wrapper"> + + <div id="lucy-top" class="container_16 lucy-white_box_3d"> + + <div id="lucy-logo_box" class="grid_8"> + <a href="/"><img src="/images/lucy_logo_150x100.png" alt="Apache Lucyâ¢"></a> + </div> <!-- lucy-logo_box --> + + <div #id="lucy-top_nav_box" class="grid_8"> + <div id="lucy-top_nav_bar" class="container_8"> + <ul> + <li><a href="http://www.apache.org/" title="Apache Software Foundation">Apache Software Foundation</a></li> + <li><a href="http://www.apache.org/licenses/" title="License">License</a></li> + <li><a href="http://www.apache.org/foundation/sponsorship.html" title="Sponsorship">Sponsorship</a></li> + <li><a href="http://www.apache.org/foundation/thanks.html" title="Thanks">Thanks</a></li> + <li><a href="http://www.apache.org/security/ " title="Security">Security</a></li> + </ul> + </div> <!-- lucy-top_nav_bar --> + <p><a href="http://www.apache.org/">Apache</a> » <a href="/">Lucy</a> » <a href="/docs/">Docs</a> » <a href="/docs/c/">C</a> » <a href="/docs/c/Lucy/">Lucy</a> » <a href="/docs/c/Lucy/Docs/">Docs</a> » <a href="/docs/c/Lucy/Docs/Tutorial/">Tutorial</a></p> + <form name="lucy-top_search_box" id="lucy-top_search_box" action="http://www.google.com/search" method="get"> + <input value="*.apache.org" name="sitesearch" type="hidden"/> + <input type="text" name="q" id="query" style="width:85%"> + <input type="submit" id="submit" value="Search"> + </form> + </div> <!-- lucy-top_nav_box --> + + <div class="clear"></div> + + </div> <!-- lucy-top --> + + <div id="lucy-main_content" class="container_16 lucy-white_box_3d"> + + <div class="grid_4" id="lucy-left_nav_box"> + <h6>About</h6> + <ul> + <li><a href="/">Welcome</a></li> + <li><a href="/clownfish.html">Clownfish</a></li> + <li><a href="/faq.html">FAQ</a></li> + <li><a href="/people.html">People</a></li> + </ul> + <h6>Resources</h6> + <ul> + <li><a href="/download.html">Download</a></li> + <li><a href="/mailing_lists.html">Mailing Lists</a></li> + <li><a href="/docs/">Documentation</a></li> + <li><a href="http://wiki.apache.org/lucy/">Wiki</a></li> + <li><a href="https://issues.apache.org/jira/browse/LUCY">Issue Tracker</a></li> + <li><a href="/version_control.html">Version Control</a></li> + </ul> + <h6>Related Projects</h6> + <ul> + <li><a href="http://lucene.apache.org/core/">Lucene</a></li> + <li><a href="http://dezi.org/">Dezi</a></li> + <li><a href="http://lucene.apache.org/solr/">Solr</a></li> + <li><a href="http://lucenenet.apache.org/">Lucene.NET</a></li> + <li><a href="http://lucene.apache.org/pylucene/">PyLucene</a></li> + </ul> + </div> <!-- lucy-left_nav_box --> + + <div id="lucy-main_content_box" class="grid_9"> + <div class="c-api"> +<h2>Use Query objects instead of query strings.</h2> +<p>Until now, our search app has had only a single search box. In this tutorial +chapter, weâll move towards an âadvanced searchâ interface, by adding a +âcategoryâ drop-down menu. Three new classes will be required:</p> +<ul> +<li> +<p><a href="../../../Lucy/Search/QueryParser.html">QueryParser</a> - Turn a query string into a +<a href="../../../Lucy/Search/Query.html">Query</a> object.</p> +</li> +<li> +<p><a href="../../../Lucy/Search/TermQuery.html">TermQuery</a> - Query for a specific term within +a specific field.</p> +</li> +<li> +<p><a href="../../../Lucy/Search/ANDQuery.html">ANDQuery</a> - âANDâ together multiple Query +objects to produce an intersected result set.</p> +</li> +</ul> +<h3>Adaptations to indexer.pl</h3> +<p>Our new âcategoryâ field will be a StringType field rather than a FullTextType +field, because we will only be looking for exact matches. It needs to be +indexed, but since we wonât display its value, it doesnât need to be stored.</p> +<pre><code class="language-c"> { + String *field_str = Str_newf("category"); + StringType *type = StringType_new(); + StringType_Set_Stored(type, false); + Schema_Spec_Field(schema, field_str, (FieldType*)type); + DECREF(type); + DECREF(field_str); + } +</code></pre> +<p>There will be three possible values: âarticleâ, âamendmentâ, and âpreambleâ, +which weâll hack out of the source fileâs name during our <code>parse_file</code> +subroutine:</p> +<pre><code class="language-c"> const char *category = NULL; + if (S_starts_with(filename, "art")) { + category = "article"; + } + else if (S_starts_with(filename, "amend")) { + category = "amendment"; + } + else if (S_starts_with(filename, "preamble")) { + category = "preamble"; + } + else { + fprintf(stderr, "Can't derive category for %s", filename); + exit(1); + } + + ... + + { + // Store 'category' field + String *field = Str_newf("category"); + String *value = Str_new_from_utf8(category, strlen(category)); + Doc_Store(doc, field, (Obj*)value); + DECREF(field); + DECREF(value); + } +</code></pre> +<h3>Adaptations to search.cgi</h3> +<p>The âcategoryâ constraint will be added to our search interface using an HTML +âselectâ element (this routine will need to be integrated into the HTML +generation section of search.cgi):</p> +<pre><code class="language-c">static void +S_usage_and_exit(const char *arg0) { + printf("Usage: %s [-c <category>] <querystring>\n", arg0); + exit(1); +} +</code></pre> +<p>Weâll start off by loading our new modules and extracting our new CGI +parameter.</p> +<pre><code class="language-c"> const char *category = NULL; + int i = 1; + + while (i < argc - 1) { + if (strcmp(argv[i], "-c") == 0) { + if (i + 1 >= argc) { + S_usage_and_exit(argv[0]); + } + i += 1; + category = argv[i]; + } + else { + S_usage_and_exit(argv[0]); + } + + i += 1; + } + + if (i + 1 != argc) { + S_usage_and_exit(argv[0]); + } + + const char *query_c = argv[i]; +</code></pre> +<p>QueryParserâs constructor requires a âschemaâ argument. We can get that from +our IndexSearcher:</p> +<pre><code class="language-c"> IndexSearcher *searcher = IxSearcher_new((Obj*)folder); + Schema *schema = IxSearcher_Get_Schema(searcher); + QueryParser *qparser = QParser_new(schema, NULL, NULL, NULL); +</code></pre> +<p>Previously, we have been handing raw query strings to IndexSearcher. Behind +the scenes, IndexSearcher has been using a QueryParser to turn those query +strings into Query objects. Now, we will bring QueryParser into the +foreground and parse the strings explicitly.</p> +<pre><code class="language-c"> Query *query = QParser_Parse(qparser, query_str); +</code></pre> +<p>If the user has specified a category, weâll use an ANDQuery to join our parsed +query together with a TermQuery representing the category.</p> +<pre><code class="language-c"> if (category) { + String *category_name = String_newf("category"); + String *category_str = String_newf("%s", category); + TermQuery *category_query + = TermQuery_new(category_name, category_str); + + Vector *children = Vec_new(2); + Vec_Push(children, (Obj*)query); + Vec_Push(children, category_query); + query = (Query*)ANDQuery_new(children); + + DECREF(children); + DECREF(category_str); + DECREF(category_name); + } +} +</code></pre> +<p>Now when we execute the queryâ¦</p> +<pre><code class="language-c"> Hits *hits = IxSearcher_Hits(searcher, (Obj*)query, 0, 10, NULL); +</code></pre> +<p>⦠weâll get a result set which is the intersection of the parsed query and +the category query.</p> +<h3>Using TermQuery with full text fields</h3> +<p>When querying full text fields, the easiest way is to create query objects +using QueryParser. But sometimes you want to create TermQuery for a single +term in a FullTextType field directly. In this case, we have to run the +search term through the fieldâs analyzer to make sure it gets normalized in +the same way as the fieldâs content.</p> +<pre><code class="language-c">Query* +make_term_query(Schema *schema, String *field, String *term) { + FieldType *type = Schema_Fetch_Type(schema, field); + String *token = NULL; + + if (FieldType_is_a(type, FULLTEXTTYPE)) { + // Run the term through the full text analysis chain. + Analyzer *analyzer = FullTextType_Get_Analyzer((FullTextType*)type); + Vector *tokens = Analyzer_Split(analyzer, term); + + if (Vec_Get_Size(tokens) != 1) { + // If the term expands to more than one token, or no + // tokens at all, it will never match a single token in + // the full text field. + DECREF(tokens); + return (Query*)NoMatchQuery_new(); + } + + token = (String*)Vec_Delete(tokens, 0); + DECREF(tokens); + } + else { + // Exact match for other types. + token = (String*)INCREF(term); + } + + TermQuery *term_query = TermQuery_new(field, (Obj*)token); + + DECREF(token); + return (Query*)term_query; +} +</code></pre> +<h3>Congratulations!</h3> +<p>Youâve made it to the end of the tutorial.</p> +<h3>See Also</h3> +<p>For additional thematic documentation, see the Apache Lucy +<a href="../../../Lucy/Docs/Cookbook.html">Cookbook</a>.</p> +<p>ANDQuery has a companion class, <a href="../../../Lucy/Search/ORQuery.html">ORQuery</a>, and a +close relative, <a href="../../../Lucy/Search/RequiredOptionalQuery.html">RequiredOptionalQuery</a>.</p> +</div> + + </div> <!-- lucy-main_content_box --> + <div class="clear"></div> + + </div> <!-- lucy-main_content --> + + <div id="lucy-copyright" class="container_16"> + <p>Copyright © 2010-2015 The Apache Software Foundation, Licensed under the + <a href="http://www.apache.org/licenses/LICENSE-2.0">Apache License, Version 2.0</a>. + <br/> + Apache Lucy, Lucy, Apache, the Apache feather logo, and the Apache Lucy project logo are trademarks of The + Apache Software Foundation. All other marks mentioned may be trademarks or registered trademarks of their + respective owners. + </p> + </div> <!-- lucy-copyright --> + + </div> <!-- lucy-rigid_wrapper --> + + </body> +</html> Added: websites/staging/lucy/trunk/content/docs/c/Lucy/Docs/Tutorial/SimpleTutorial.html ============================================================================== --- websites/staging/lucy/trunk/content/docs/c/Lucy/Docs/Tutorial/SimpleTutorial.html (added) +++ websites/staging/lucy/trunk/content/docs/c/Lucy/Docs/Tutorial/SimpleTutorial.html Mon Apr 4 12:55:27 2016 @@ -0,0 +1,311 @@ +<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> +<html lang="en"> + <head> + <meta http-equiv="Content-Type" content="text/html;charset=UTF-8"> + <title>Lucy::Docs::Tutorial::SimpleTutorial</title> + <link rel="stylesheet" type="text/css" media="screen" href="/css/lucy.css"> + </head> + + <body> + + <div id="lucy-rigid_wrapper"> + + <div id="lucy-top" class="container_16 lucy-white_box_3d"> + + <div id="lucy-logo_box" class="grid_8"> + <a href="/"><img src="/images/lucy_logo_150x100.png" alt="Apache Lucyâ¢"></a> + </div> <!-- lucy-logo_box --> + + <div #id="lucy-top_nav_box" class="grid_8"> + <div id="lucy-top_nav_bar" class="container_8"> + <ul> + <li><a href="http://www.apache.org/" title="Apache Software Foundation">Apache Software Foundation</a></li> + <li><a href="http://www.apache.org/licenses/" title="License">License</a></li> + <li><a href="http://www.apache.org/foundation/sponsorship.html" title="Sponsorship">Sponsorship</a></li> + <li><a href="http://www.apache.org/foundation/thanks.html" title="Thanks">Thanks</a></li> + <li><a href="http://www.apache.org/security/ " title="Security">Security</a></li> + </ul> + </div> <!-- lucy-top_nav_bar --> + <p><a href="http://www.apache.org/">Apache</a> » <a href="/">Lucy</a> » <a href="/docs/">Docs</a> » <a href="/docs/c/">C</a> » <a href="/docs/c/Lucy/">Lucy</a> » <a href="/docs/c/Lucy/Docs/">Docs</a> » <a href="/docs/c/Lucy/Docs/Tutorial/">Tutorial</a></p> + <form name="lucy-top_search_box" id="lucy-top_search_box" action="http://www.google.com/search" method="get"> + <input value="*.apache.org" name="sitesearch" type="hidden"/> + <input type="text" name="q" id="query" style="width:85%"> + <input type="submit" id="submit" value="Search"> + </form> + </div> <!-- lucy-top_nav_box --> + + <div class="clear"></div> + + </div> <!-- lucy-top --> + + <div id="lucy-main_content" class="container_16 lucy-white_box_3d"> + + <div class="grid_4" id="lucy-left_nav_box"> + <h6>About</h6> + <ul> + <li><a href="/">Welcome</a></li> + <li><a href="/clownfish.html">Clownfish</a></li> + <li><a href="/faq.html">FAQ</a></li> + <li><a href="/people.html">People</a></li> + </ul> + <h6>Resources</h6> + <ul> + <li><a href="/download.html">Download</a></li> + <li><a href="/mailing_lists.html">Mailing Lists</a></li> + <li><a href="/docs/">Documentation</a></li> + <li><a href="http://wiki.apache.org/lucy/">Wiki</a></li> + <li><a href="https://issues.apache.org/jira/browse/LUCY">Issue Tracker</a></li> + <li><a href="/version_control.html">Version Control</a></li> + </ul> + <h6>Related Projects</h6> + <ul> + <li><a href="http://lucene.apache.org/core/">Lucene</a></li> + <li><a href="http://dezi.org/">Dezi</a></li> + <li><a href="http://lucene.apache.org/solr/">Solr</a></li> + <li><a href="http://lucenenet.apache.org/">Lucene.NET</a></li> + <li><a href="http://lucene.apache.org/pylucene/">PyLucene</a></li> + </ul> + </div> <!-- lucy-left_nav_box --> + + <div id="lucy-main_content_box" class="grid_9"> + <div class="c-api"> +<h2>Bare-bones search app.</h2> +<h3>Setup</h3> +<p>Copy the text presentation of the US Constitution from the <code>sample</code> directory +of the Apache Lucy distribution to the base level of your web serverâs +<code>htdocs</code> directory.</p> +<pre><code>$ cp -R sample/us_constitution /usr/local/apache2/htdocs/ +</code></pre> +<h3>Indexing: indexer.pl</h3> +<p>Our first task will be to create an application called <code>indexer.pl</code> which +builds a searchable âinverted indexâ from a collection of documents.</p> +<p>After we specify some configuration variables and load all necessary +modulesâ¦</p> +<pre><code class="language-c">#include <dirent.h> +#include <stdio.h> +#include <stdlib.h> +#include <string.h> + +#define CFISH_USE_SHORT_NAMES +#define LUCY_USE_SHORT_NAMES +#include "Clownfish/String.h" +#include "Lucy/Simple.h" +#include "Lucy/Document/Doc.h" + +const char path_to_index[] = "lucy_index"; +const char uscon_source[] = "../../common/sample/us_constitution"; +</code></pre> +<p>⦠weâll start by creating a <a href="../../../Lucy/Simple.html">Lucy::Simple</a> object, telling it +where weâd like the index to be located and the language of the source +material.</p> +<pre><code class="language-c">int +main() { + // Initialize the library. + lucy_bootstrap_parcel(); + + String *folder = Str_newf("%s", path_to_index); + String *language = Str_newf("en"); + Simple *lucy = Simple_new((Obj*)folder, language); +</code></pre> +<p>Next, weâll add a subroutine which parses our sample documents.</p> +<pre><code class="language-c">Doc* +S_parse_file(const char *filename) { + size_t bytes = strlen(uscon_source) + 1 + strlen(filename) + 1; + char *path = (char*)malloc(bytes); + path[0] = '\0'; + strcat(path, uscon_source); + strcat(path, "/"); + strcat(path, filename); + + FILE *stream = fopen(path, "r"); + if (stream == NULL) { + perror(path); + exit(1); + } + + char *title = NULL; + char *bodytext = NULL; + if (fscanf(stream, "%m[^\r\n] %m[\x01-\x7F]", &title, &bodytext) != 2) { + fprintf(stderr, "Can't extract title/bodytext from '%s'", path); + exit(1); + } + + Doc *doc = Doc_new(NULL, 0); + + { + // Store 'title' field + String *field = Str_newf("title"); + String *value = Str_new_from_utf8(title, strlen(title)); + Doc_Store(doc, field, (Obj*)value); + DECREF(field); + DECREF(value); + } + + { + // Store 'content' field + String *field = Str_newf("content"); + String *value = Str_new_from_utf8(bodytext, strlen(bodytext)); + Doc_Store(doc, field, (Obj*)value); + DECREF(field); + DECREF(value); + } + + { + // Store 'url' field + String *field = Str_newf("url"); + String *value = Str_new_from_utf8(filename, strlen(filename)); + Doc_Store(doc, field, (Obj*)value); + DECREF(field); + DECREF(value); + } + + fclose(stream); + free(bodytext); + free(title); + free(path); + return doc; +} +</code></pre> +<p>Add some elementary directory reading codeâ¦</p> +<pre><code class="language-c"> DIR *dir = opendir(uscon_source); + if (dir == NULL) { + perror(uscon_source); + return 1; + } +</code></pre> +<p>⦠and now weâre ready for the meat of indexer.pl â which occupies exactly +one line of code.</p> +<pre><code class="language-c"> for (struct dirent *entry = readdir(dir); + entry; + entry = readdir(dir)) { + + if (S_ends_with(entry->d_name, ".txt")) { + Doc *doc = S_parse_file(entry->d_name); + Simple_Add_Doc(lucy, doc); // ta-da! + DECREF(doc); + } + } + + closedir(dir); + + DECREF(lucy); + DECREF(language); + DECREF(folder); + return 0; +} +</code></pre> +<h3>Search: search.cgi</h3> +<p>As with our indexing app, the bulk of the code in our search script wonât be +Lucy-specific.</p> +<p>The beginning is dedicated to CGI processing and configuration.</p> +<pre><code class="language-c">#include <stdio.h> +#include <stdlib.h> +#include <string.h> + +#define CFISH_USE_SHORT_NAMES +#define LUCY_USE_SHORT_NAMES +#include "Clownfish/String.h" +#include "Lucy/Document/HitDoc.h" +#include "Lucy/Simple.h" + +const char path_to_index[] = "lucy_index"; + +static void +S_usage_and_exit(const char *arg0) { + printf("Usage: %s <querystring>\n", arg0); + exit(1); +} + +int +main(int argc, char *argv[]) { + // Initialize the library. + lucy_bootstrap_parcel(); + + if (argc != 2) { + S_usage_and_exit(argv[0]); + } + + const char *query_c = argv[1]; + + printf("Searching for: %s\n\n", query_c); +</code></pre> +<p>Once thatâs out of the way, we create our Lucy::Simple object and feed +it a query string.</p> +<pre><code class="language-c"> String *folder = Str_newf("%s", path_to_index); + String *language = Str_newf("en"); + Simple *lucy = Simple_new((Obj*)folder, language); + + String *query_str = Str_newf("%s", query_c); + Simple_Search(lucy, query_str, 0, 10); +</code></pre> +<p>The value returned by <a href="../../../Lucy/Simple.html#func_Search">Search()</a> is the total number of documents +in the collection which matched the query. Weâll show this hit count to the +user, and also use it in conjunction with the parameters <code>offset</code> and +<code>num_wanted</code> to break up results into âpagesâ of manageable size.</p> +<p>Calling <a href="../../../Lucy/Simple.html#func_Search">Search()</a> on our Simple object turns it into an iterator. +Invoking <a href="../../../Lucy/Simple.html#func_Next">Next()</a> now returns hits one at a time as <a href="../../../Lucy/Document/HitDoc.html">HitDoc</a> +objects, starting with the most relevant.</p> +<pre><code class="language-c"> String *title_str = Str_newf("title"); + String *url_str = Str_newf("url"); + HitDoc *hit; + int i = 1; + + // Loop over search results. + while (NULL != (hit = Simple_Next(lucy))) { + String *title = (String*)HitDoc_Extract(hit, title_str); + char *title_c = Str_To_Utf8(title); + + String *url = (String*)HitDoc_Extract(hit, url_str); + char *url_c = Str_To_Utf8(url); + + printf("Result %d: %s (%s)\n", i, title_c, url_c); + + free(url_c); + free(title_c); + DECREF(url); + DECREF(title); + DECREF(hit); + i++; + } + + DECREF(url_str); + DECREF(title_str); + DECREF(query_str); + DECREF(lucy); + DECREF(language); + DECREF(folder); + return 0; +} +</code></pre> +<p>The rest of the script is just text wrangling.</p> +<pre><code>Code example for C is missing</code></pre> +<h3>OK⦠now what?</h3> +<p>Lucy::Simple is perfectly adequate for some tasks, but itâs not very flexible. +Many people find that it doesnât do at least one or two things they canât live +without.</p> +<p>In our next tutorial chapter, +<a href="../../../Lucy/Docs/Tutorial/BeyondSimpleTutorial.html">BeyondSimpleTutorial</a>, weâll rewrite our +indexing and search scripts using the classes that Lucy::Simple hides +from view, opening up the possibilities for expansion; then, weâll spend the +rest of the tutorial chapters exploring these possibilities.</p> +</div> + + </div> <!-- lucy-main_content_box --> + <div class="clear"></div> + + </div> <!-- lucy-main_content --> + + <div id="lucy-copyright" class="container_16"> + <p>Copyright © 2010-2015 The Apache Software Foundation, Licensed under the + <a href="http://www.apache.org/licenses/LICENSE-2.0">Apache License, Version 2.0</a>. + <br/> + Apache Lucy, Lucy, Apache, the Apache feather logo, and the Apache Lucy project logo are trademarks of The + Apache Software Foundation. All other marks mentioned may be trademarks or registered trademarks of their + respective owners. + </p> + </div> <!-- lucy-copyright --> + + </div> <!-- lucy-rigid_wrapper --> + + </body> +</html> Added: websites/staging/lucy/trunk/content/docs/c/Lucy/Document/Doc.html ============================================================================== --- websites/staging/lucy/trunk/content/docs/c/Lucy/Document/Doc.html (added) +++ websites/staging/lucy/trunk/content/docs/c/Lucy/Document/Doc.html Mon Apr 4 12:55:27 2016 @@ -0,0 +1,268 @@ +<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> +<html lang="en"> + <head> + <meta http-equiv="Content-Type" content="text/html;charset=UTF-8"> + <title>Lucy::Document::Doc â C API Documentation</title> + <link rel="stylesheet" type="text/css" media="screen" href="/css/lucy.css"> + </head> + + <body> + + <div id="lucy-rigid_wrapper"> + + <div id="lucy-top" class="container_16 lucy-white_box_3d"> + + <div id="lucy-logo_box" class="grid_8"> + <a href="/"><img src="/images/lucy_logo_150x100.png" alt="Apache Lucyâ¢"></a> + </div> <!-- lucy-logo_box --> + + <div #id="lucy-top_nav_box" class="grid_8"> + <div id="lucy-top_nav_bar" class="container_8"> + <ul> + <li><a href="http://www.apache.org/" title="Apache Software Foundation">Apache Software Foundation</a></li> + <li><a href="http://www.apache.org/licenses/" title="License">License</a></li> + <li><a href="http://www.apache.org/foundation/sponsorship.html" title="Sponsorship">Sponsorship</a></li> + <li><a href="http://www.apache.org/foundation/thanks.html" title="Thanks">Thanks</a></li> + <li><a href="http://www.apache.org/security/ " title="Security">Security</a></li> + </ul> + </div> <!-- lucy-top_nav_bar --> + <p><a href="http://www.apache.org/">Apache</a> » <a href="/">Lucy</a> » <a href="/docs/">Docs</a> » <a href="/docs/c/">C</a> » <a href="/docs/c/Lucy/">Lucy</a> » <a href="/docs/c/Lucy/Document/">Document</a></p> + <form name="lucy-top_search_box" id="lucy-top_search_box" action="http://www.google.com/search" method="get"> + <input value="*.apache.org" name="sitesearch" type="hidden"/> + <input type="text" name="q" id="query" style="width:85%"> + <input type="submit" id="submit" value="Search"> + </form> + </div> <!-- lucy-top_nav_box --> + + <div class="clear"></div> + + </div> <!-- lucy-top --> + + <div id="lucy-main_content" class="container_16 lucy-white_box_3d"> + + <div class="grid_4" id="lucy-left_nav_box"> + <h6>About</h6> + <ul> + <li><a href="/">Welcome</a></li> + <li><a href="/clownfish.html">Clownfish</a></li> + <li><a href="/faq.html">FAQ</a></li> + <li><a href="/people.html">People</a></li> + </ul> + <h6>Resources</h6> + <ul> + <li><a href="/download.html">Download</a></li> + <li><a href="/mailing_lists.html">Mailing Lists</a></li> + <li><a href="/docs/">Documentation</a></li> + <li><a href="http://wiki.apache.org/lucy/">Wiki</a></li> + <li><a href="https://issues.apache.org/jira/browse/LUCY">Issue Tracker</a></li> + <li><a href="/version_control.html">Version Control</a></li> + </ul> + <h6>Related Projects</h6> + <ul> + <li><a href="http://lucene.apache.org/core/">Lucene</a></li> + <li><a href="http://dezi.org/">Dezi</a></li> + <li><a href="http://lucene.apache.org/solr/">Solr</a></li> + <li><a href="http://lucenenet.apache.org/">Lucene.NET</a></li> + <li><a href="http://lucene.apache.org/pylucene/">PyLucene</a></li> + </ul> + </div> <!-- lucy-left_nav_box --> + + <div id="lucy-main_content_box" class="grid_9"> + <div class="c-api"> +<h2>Lucy::Document::Doc</h2> +<table> +<tr> +<td class="label">parcel</td> +<td><a href="../../lucy.html">Lucy</a></td> +</tr> +<tr> +<td class="label">class variable</td> +<td><code><span class="prefix">LUCY_</span>DOC</code></td> +</tr> +<tr> +<td class="label">struct symbol</td> +<td><code><span class="prefix">lucy_</span>Doc</code></td> +</tr> +<tr> +<td class="label">class nickname</td> +<td><code><span class="prefix">lucy_</span>Doc</code></td> +</tr> +<tr> +<td class="label">header file</td> +<td><code>Lucy/Document/Doc.h</code></td> +</tr> +</table> +<h3>Name</h3> +<p>Lucy::Document::Doc â A document.</p> +<h3>Description</h3> +<p>A Doc object is akin to a row in a database, in that it is made up of one +or more fields, each of which has a value.</p> +<h3>Functions</h3> +<dl> +<dt id="func_new">new</dt> +<dd> +<pre><code><span class="prefix">lucy_</span>Doc* <span class="comment">// incremented</span> +<span class="prefix">lucy_</span><strong>Doc_new</strong>( + void *<strong>fields</strong>, + int32_t <strong>doc_id</strong> +); +</code></pre> +<p>Create a new Document.</p> +<dl> +<dt>fields</dt> +<dd><p>Field-value pairs.</p> +</dd> +<dt>doc_id</dt> +<dd><p>Internal Lucy document id. Default of 0 (an +invalid doc id).</p> +</dd> +</dl> +</dd> +<dt id="func_init">init</dt> +<dd> +<pre><code><span class="prefix">lucy_</span>Doc* +<span class="prefix">lucy_</span><strong>Doc_init</strong>( + <span class="prefix">lucy_</span>Doc *<strong>self</strong>, + void *<strong>fields</strong>, + int32_t <strong>doc_id</strong> +); +</code></pre> +<p>Initialize a Document.</p> +<dl> +<dt>fields</dt> +<dd><p>Field-value pairs.</p> +</dd> +<dt>doc_id</dt> +<dd><p>Internal Lucy document id. Default of 0 (an +invalid doc id).</p> +</dd> +</dl> +</dd> +</dl> +<h3>Methods</h3> +<dl> +<dt id="func_Set_Doc_ID">Set_Doc_ID</dt> +<dd> +<pre><code>void +<span class="prefix">lucy_</span><strong>Doc_Set_Doc_ID</strong>( + <span class="prefix">lucy_</span>Doc *<strong>self</strong>, + int32_t <strong>doc_id</strong> +); +</code></pre> +<p>Set internal Lucy document id.</p> +</dd> +<dt id="func_Get_Doc_ID">Get_Doc_ID</dt> +<dd> +<pre><code>int32_t +<span class="prefix">lucy_</span><strong>Doc_Get_Doc_ID</strong>( + <span class="prefix">lucy_</span>Doc *<strong>self</strong> +); +</code></pre> +<p>Retrieve internal Lucy document id.</p> +</dd> +<dt id="func_Store">Store</dt> +<dd> +<pre><code>void +<span class="prefix">lucy_</span><strong>Doc_Store</strong>( + <span class="prefix">lucy_</span>Doc *<strong>self</strong>, + <span class="prefix">cfish_</span><a href="../../Clownfish/String.html">String</a> *<strong>field</strong>, + <span class="prefix">cfish_</span><a href="../../Clownfish/Obj.html">Obj</a> *<strong>value</strong> +); +</code></pre> +<p>Store a field value in the Doc.</p> +<dl> +<dt>field</dt> +<dd><p>The field name</p> +</dd> +<dt>value</dt> +<dd><p>The value</p> +</dd> +</dl> +</dd> +<dt id="func_Get_Fields">Get_Fields</dt> +<dd> +<pre><code>void* +<span class="prefix">lucy_</span><strong>Doc_Get_Fields</strong>( + <span class="prefix">lucy_</span>Doc *<strong>self</strong> +); +</code></pre> +<p>Return the Docâs backing fields hash.</p> +</dd> +<dt id="func_Get_Size">Get_Size</dt> +<dd> +<pre><code>uint32_t +<span class="prefix">lucy_</span><strong>Doc_Get_Size</strong>( + <span class="prefix">lucy_</span>Doc *<strong>self</strong> +); +</code></pre> +<p>Return the number of fields in the Doc.</p> +</dd> +<dt id="func_Extract">Extract</dt> +<dd> +<pre><code><span class="prefix">cfish_</span><a href="../../Clownfish/Obj.html">Obj</a>* <span class="comment">// incremented</span> +<span class="prefix">lucy_</span><strong>Doc_Extract</strong>( + <span class="prefix">lucy_</span>Doc *<strong>self</strong>, + <span class="prefix">cfish_</span><a href="../../Clownfish/String.html">String</a> *<strong>field</strong> +); +</code></pre> +<p>Retrieve the fieldâs value, or NULL if the field is not present.</p> +</dd> +<dt id="func_Field_Names">Field_Names</dt> +<dd> +<pre><code><span class="prefix">cfish_</span><a href="../../Clownfish/Vector.html">Vector</a>* <span class="comment">// incremented</span> +<span class="prefix">lucy_</span><strong>Doc_Field_Names</strong>( + <span class="prefix">lucy_</span>Doc *<strong>self</strong> +); +</code></pre> +<p>Return a list of names of all fields present.</p> +</dd> +<dt id="func_Equals">Equals</dt> +<dd> +<pre><code>bool +<span class="prefix">lucy_</span><strong>Doc_Equals</strong>( + <span class="prefix">lucy_</span>Doc *<strong>self</strong>, + <span class="prefix">cfish_</span><a href="../../Clownfish/Obj.html">Obj</a> *<strong>other</strong> +); +</code></pre> +<p>Indicate whether two objects are the same. By default, compares the +memory address.</p> +<dl> +<dt>other</dt> +<dd><p>Another Obj.</p> +</dd> +</dl> +</dd> +<dt id="func_Destroy">Destroy</dt> +<dd> +<pre><code>void +<span class="prefix">lucy_</span><strong>Doc_Destroy</strong>( + <span class="prefix">lucy_</span>Doc *<strong>self</strong> +); +</code></pre> +<p>Generic destructor. Frees the struct itself but not any complex +member elements.</p> +</dd> +</dl> +<h3>Inheritance</h3> +<p>Lucy::Document::Doc is a <a href="../../Clownfish/Obj.html">Clownfish::Obj</a>.</p> +</div> + + </div> <!-- lucy-main_content_box --> + <div class="clear"></div> + + </div> <!-- lucy-main_content --> + + <div id="lucy-copyright" class="container_16"> + <p>Copyright © 2010-2015 The Apache Software Foundation, Licensed under the + <a href="http://www.apache.org/licenses/LICENSE-2.0">Apache License, Version 2.0</a>. + <br/> + Apache Lucy, Lucy, Apache, the Apache feather logo, and the Apache Lucy project logo are trademarks of The + Apache Software Foundation. All other marks mentioned may be trademarks or registered trademarks of their + respective owners. + </p> + </div> <!-- lucy-copyright --> + + </div> <!-- lucy-rigid_wrapper --> + + </body> +</html>
