jeremy 2002/11/23 12:44:18
Modified: src/documentation/xdocs/userdocs/concepts xmlsearching.xml
Log:
added 'extending samples' section, ran a spell checker on it
Revision Changes Path
1.4 +56 -12
xml-cocoon2/src/documentation/xdocs/userdocs/concepts/xmlsearching.xml
Index: xmlsearching.xml
===================================================================
RCS file:
/home/cvs/xml-cocoon2/src/documentation/xdocs/userdocs/concepts/xmlsearching.xml,v
retrieving revision 1.3
retrieving revision 1.4
diff -u -r1.3 -r1.4
--- xmlsearching.xml 23 Nov 2002 17:11:17 -0000 1.3
+++ xmlsearching.xml 23 Nov 2002 20:44:18 -0000 1.4
@@ -49,7 +49,7 @@
Specifying the base URL determines the protocol for fetching XML
resources.
The implementation offers to specify <code>http:</code> URLs,
crawling an Apache Cocoon instance deployed in a servlet-engine.
- Alternativly you may specify an URI, e.g.:
<code>/documents/index.html</code>,
+ Alternatively you may specify an URI, e.g.:
<code>/documents/index.html</code>,
offering to crawl the local Apache Cocoon instance only, either
servlet-deployed, or in commandline-mode.
</p>
@@ -139,7 +139,7 @@
<p>
As both Avalon components <code>LuceneXMLIndexer</code>, and
<code>LuceneCocoonSearcher</code> may use the same Lucene index, you
must
- take care of the Lucene index structure in both compoents.
+ take care of the Lucene index structure in both components.
</p>
<p>
The current implementation uses following Lucene index layout
@@ -151,11 +151,11 @@
</li>
<li>Each XML element generates a Lucene field having the same name
as the XML element name.
For example searching for occurences of <code>Cocoon</code> inside
of an XML abstract
- elemen, use query-string <code>abstact:Cocoon</code>.
+ element, use query-string <code>abstact:Cocoon</code>.
</li>
<li>Each XML attribute generates a Lucene field having the name
<code>[EMAIL PROTECTED]</code>.
- For example searching for occurences of <code>Cocoon</code> inside
of an XML title attribute
+ For example searching for occurrences of <code>Cocoon</code>
inside of an XML title attribute
of s1 element, use query-string <code>[EMAIL
PROTECTED]:Cocoon</code>.
</li>
<li>
@@ -182,18 +182,18 @@
in the <code>cocoon.xconf</code> file.
</p>
<s2 title="example">
- <p>This would set up the crawler to crawl all
of your site, except pages in the 'search' section, also we are telling the
crawler to use a non-standard cocoon-view for getting the links in documents,
called 'my-search-links'. </p>
+ <p>This would set up the crawler to crawl all
of your site, except pages in the 'search' section, also we are telling the
crawler to use a non-standard cocoon-view for getting the links in documents,
called <code>my-search-links</code>. </p>
<source><![CDATA[
<cocoon-crawler logger="core.search.crawler">
- <exclude>.*/search/.*</exclude>
- <link-view-query>cocoon-view=my-search-links</link-view-query>
+ <exclude>.*/search/.*</exclude>
+ <link-view-query>cocoon-view=my-search-links</link-view-query>
</cocoon-crawler>
]]></source>
- <p>This tells the indexer to use the non-standard
'my-search-content' view to retrieve the content for indexing. Also it tells
the indexer that we would like to have any 'title' or 'subtitle' XML elements
in the documant added to the index as stored fields, so they can be retrieved
and displayed to the user with any hits they get.</p>
+ <p>This tells the indexer to use the non-standard
'my-search-content' view to retrieve the content for indexing. Also it tells
the indexer that we would like to have any <code>title</code> or
<code>subtitle</code> XML elements in the document added to the index as stored
fields, so they can be retrieved and displayed to the user with any hits they
get.</p>
<source><![CDATA[
<lucene-xml-indexer logger="core.search.lucene">
- <store-fields>title, subtitle</store-fields>
- <content-view-query>cocoon-view=my-search-content</content-view-query>
+ <store-fields>title, subtitle</store-fields>
+ <content-view-query>cocoon-view=my-search-content</content-view-query>
</lucene-xml-indexer>
]]></source>
</s2>
@@ -209,7 +209,7 @@
<p>This would generate a document from a search, getting the
query from the sitemap parameter '1' and other information from request
parameters.</p>
<source><![CDATA[
<map:generate type="search">
- <map:parameter name="query" value="{1}"/>
+ <map:parameter name="query" value="{1}"/>
</map:generate>
]]></source>
</s2>
@@ -274,7 +274,51 @@
needs.
</p>
</s1>
-
+ <s1 title="Extending the Sample">
+ <p>
+ It is easy to extend the search sample to display more information
about the search hit than just the url of the resource.</p>
+ <p>In order to show, for example, the title and summary
of a document, these first need to be added to the search index as 'Stored
Fields'. Then when the documents are found during a search, that information is
available to display, from the search engine itself.</p>
+ <p>First, decide which fields you want to store.</p>
+ <p>Decide where is the best place in your pipeline for content to be
extracted for indexing, it might not always be the default view 'content'.</p>
+ <p>Next, decide if you need an XSLT transformation on your documents,
to make them more suitable for indexing. This may include deciding on one of
several titles in your document, what part of your document gets added to the
summary etc. You might want to strip certain tags out because you don't want
their content searched. You might be able to raise hit scores on documents by
re-arranging content, or keeping larger amounts of content in fewer tags.</p>
+ <p>Now you tell the search engine (in cocoon.xconf)
which tags you'd like storing.</p>
+<source><![CDATA[
+<lucene-xml-indexer logger="core.search.lucene">
+ <store-fields>title, summary</store-fields>
+ <content-view-query>cocoon-view=search-content</content-view-query>
+</lucene-xml-indexer>
+]]></source>
+ <p>This example tells the indexer to store any tags
called 'title' or 'summary' it finds in your documents. It also tells the
indexer to get it's content from the view called 'search-content'.</p>
+<source><![CDATA[
+<map:view from-label="search" name="search">
+ <map:transform src="search-filter.xsl"/>
+ <map:serialize type="xml"/>
+</map:view>
+]]></source>
+ <p>This is how you might setup that custom view in your
sitemap. You would then add a label attribute <code>label="search"</code> to
the appropriate place in your pipelines. See the section on views for more
information.</p>
+ <p>After you have re-indexed the site, when you do
searches, the new fields will be available in the XML output by Lucene, in the
form of a <code>search:field</code> tag, you will need to modify your XSLT that
displays the hits to show this.</p>
+<source><![CDATA[
+<xsl:template match="search:hit">
+ <tr>
+ <td>
+ <xsl:value-of select="format-number( @search:score, '### %' )"/>
+ </td>
+ <td>
+ <xsl:value-of select="@search:rank"/>
+ </td>
+ <td>
+ <a target="_blank" href="[EMAIL PROTECTED]:uri}">
+ <xsl:attribute name="title">
+ <xsl:value-of select="search:[EMAIL PROTECTED]:name='summary']"/>
+ </xsl:attribute>
+ <xsl:value-of select="search:[EMAIL PROTECTED]:name='title']"/>
+ </a>
+ </td>
+ </tr>
+</xsl:template>
+]]></source>
+<p>This is how the search sample's xslt might be changed. All the fields you
made for each document are available to you as <code>search:field</code>
elements in the <code>search:hit</code> elements. The code above assumes you
only had one 'title' and one 'summary' per document.</p>
+ </s1>
<s1 title="Summary">
<p>
This document gives an overview of the components for
----------------------------------------------------------------------
In case of troubles, e-mail: [EMAIL PROTECTED]
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]