On Sat, 15 Dec 2001, Bernhard Huber wrote: > Hi, > I'd like to commit Searching XML in Cocoon. > I must confess that I have not taken the CVS SSH hurdle, yet.
You have the links for it? > Moreover I like to know into which branch I should check-in and if its > into src, or scratchpad. I'd recommend scratchpad for now. > As this is not final, I think inserting into scratchpad would be better, > moreover people may use and try it first. Yup. > I think using a sitemap would be okay for using the searching, and > indexing, and demonstrating the usage of these components. I'm not sure what you mean. A sub-sitenap for the samples? > Uhps, and I think I have vioaleted the codeing convents indenting only 2 > spaces, need to reformat before submitting, > is there any tool for that? I do that with (X)Emacs. > Any comments? > > Some docu about the feature... Would be cool if you can rewrite these docs using DocBook or Document-v10 DTDs. Giacomo > > Abstract > Searching XML in Cocoon using Lucene as search engine. > > Overview > Lucene ( http://jakarta.apache.org/lucene ) is a indexing & searching API. > Several new Cocoon components utilizes this API to provide "Searching > XML in Cocoon". > > There are two services provided by these components: > Indexing > Searching > > Indexing is realized by crawling starting from a base URI, and > generating a lucene index. > Searching uses the generated lucene index. The index is searched for a > requested query. > > The crawling component is packed in org.apache.cocoon.components.crawler. > Indexing and searching is packed in org.apache.cocoon.components.search. > A Cocoon generator using the searching components is packaged in > org.apache.cocoon.generation. > > A GUI for searching is implemented by using XSP, and as a generator. > Both implementions can be used independtly. > > Description > > As having an existing index is a precondition for searching, the > description of crawling and indexing is described first; a description > of the searching follows. > > The crawling component provides all links of requested URI. The links of > a URI are requested by using the Cocoon feature of views. A URI which is > allowed to get crawled, must provide a view. By default the crawling > component requests the view links. > A link view must provide a response of content type > application/x-cocoon-links. Using a serializer type links having src > org.apache.cocoon.serialization.LinkSerializer will guarentee the > correct content type. > > The indexing component crawls in-depth, starting from a given base URI. > The indexing component uses a crawler component to receive all links of > a page. The indexing component filters the response of a crawler. > Filtering asserts following conditions: > Index only resources which have not been indexed already. > Index only resources which are indexable, like documents, ignore images, > non-xml documents. > > Indexing parses an XML document, and produces a lucene document. A > lucene document may have serval fields, which acts like columns of a > database table. > > Indexing writes the lucene index into a directory, by default the Cocoon > working directory is used. Moreover a lucene analyzer, and the lucene > writing mode must be defined. > > The searching components uses a created lucene index. The index may be > created by any lucene indexer. > The searching component must have access to an index directory, and it > should use the same lucene analyzer as the indexer at creation time of > the index directory. > The searching component returns all hits of a search, the XSP, and the > generator filters the hits for a all hits displayed on a page. > > The search generator searches the lucene index by using the searching > components, and > generates XML content. > As sample of the XML content produced by the search generator: > > <?xml version="1.0" encoding="UTF-8"?> > <search:results date="1008437081064" query-string="cocoon" > start-index="0" page-length="10" > xmlns:search="http://apache.org/cocoon/search/1.0" > xmlns:xlink="http://www.w3.org/1999/xlink"> > <search:hits total-count="125" count-of-pages="13"> > <search:hit rank="0" score="1.0" > uri="http://localhost:8080/cocoon/documents/hosting.html"/> > <search:hit rank="1" score="1.0" > uri="http://localhost:8080/cocoon/documents/hosting.html"/> > <search:hit rank="2" score="1.0" > uri="http://localhost:8080/cocoon/documents/hosting.html"/> > <search:hit rank="3" score="0.93121004" > uri="http://localhost:8080/cocoon/documents/userdocs/actions/actions.html"/> > <search:hit rank="4" score="0.93121004" > uri="http://localhost:8080/cocoon/documents/userdocs/actions/actions.html"/> > <search:hit rank="5" score="0.7112235" > uri="http://localhost:8080/cocoon/documents/mail-archives.html"/> > <search:hit rank="6" score="0.70967746" > >uri="http://localhost:8080/cocoon/documents/userdocs/serializers/link-serializer.html"/> > <search:hit rank="7" score="0.6881721" > >uri="http://localhost:8080/cocoon/documents/userdocs/serializers/text-serializer.html"/> > <search:hit rank="8" score="0.6881721" > >uri="http://localhost:8080/cocoon/documents/userdocs/serializers/vrml-serializer.html"/> > <search:hit rank="9" score="0.6666666" > >uri="http://localhost:8080/cocoon/documents/userdocs/serializers/svgpng-serializer.html"/> > </search:hits> > <search:navigation total-count="125" count-of-pages="13" > has-next="true" has-previous="false" next-index="10" previous-index="0"> > <search:navigation-page start-index="0"/> > <search:navigation-page start-index="10"/> > <search:navigation-page start-index="20"/> > <search:navigation-page start-index="30"/> > <search:navigation-page start-index="40"/> > <search:navigation-page start-index="50"/> > <search:navigation-page start-index="60"/> > <search:navigation-page start-index="70"/> > <search:navigation-page start-index="80"/> > <search:navigation-page start-index="90"/> > <search:navigation-page start-index="100"/> > <search:navigation-page start-index="110"/> > <search:navigation-page start-index="120"/> > </search:navigation> > </search:results> > > The navigation elements is for easy handling of navigation issues, in a > xslt. > > Bill Of Material: > > New packages: > org.apache.cocoon.components.crawler, > org.apache.cocoon.components.search > > New avalon components: > org.apache.cocoon.components.crawler.CocoonCrawler > org.apache.cocoon.components.crawler.SimpleCocoonCrawlerImpl: > external http crawler for Cocoon. This crawler generates a list of links > received from a URI request, enhancing it with a cocoon-view query. > > org.apache.cocoon.components.IndexHelperField > org.apache.cocoon.components.LuceneCocoonHelper > org.apache.cocoon.components.LuceneCocoonIndexer > org.apache.cocoon.components.LuceneCocoonPager > org.apache.cocoon.components.LuceneCocoonSearcher > org.apache.cocoon.components.LuceneIndexContentHandler > org.apache.cocoon.components.LuceneXMLIndexer > org.apache.cocoon.components.SimpleLuceneCocoonIndexerImpl > org.apache.cocoon.components.SimpleLuceneCocoonSearcherImpl > org.apache.cocoon.components.SimpleLuceneXMLIndexerImpl > > New sitemap components: > org.apache.cocoon.generation.SearchGenerator > > New JUnit testcase: > org.apache.cocoon.generation.test.SearchGeneratorTestCase > > New webapp resources: > sitemap.xmap > search-index.xsp > welcome-index.xsp > create-index.xsp > stylesheets/search2html.xsl > lucene_green_300.gif > > Compiling & Installing: > > For compiling, and at runtime, a lucene.jar is neccessary. This will > need a changing the build.xml is neccessary, too, for checking availability, > and modifying the webapp sitemap for includeing the search demo. > > Installing the the avalon components needs change of the cocoon.xconf > file inserting the avalon components > org.apache.cocoon.components.LuceneXMLIndexer > org.apache.cocoon.components.SimpleLuceneCocoonIndexerImpl > org.apache.cocoon.components.SimpleLuceneCocoonSearcherImpl > org.apache.cocoon.components.SimpleLuceneXMLIndexerImpl. > > A sitemap, or subsitemap to be adapted for using the XSP, and the generator. > > > bye bernhad > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, email: [EMAIL PROTECTED] > > > > --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, email: [EMAIL PROTECTED]