Hi, I'd like to commit Searching XML in Cocoon. I must confess that I have not taken the CVS SSH hurdle, yet. Moreover I like to know into which branch I should check-in and if its into src, or scratchpad. As this is not final, I think inserting into scratchpad would be better, moreover people may use and try it first. I think using a sitemap would be okay for using the searching, and indexing, and demonstrating the usage of these components. Uhps, and I think I have vioaleted the codeing convents indenting only 2 spaces, need to reformat before submitting, is there any tool for that?
Any comments? Some docu about the feature... Abstract Searching XML in Cocoon using Lucene as search engine. Overview Lucene ( http://jakarta.apache.org/lucene ) is a indexing & searching API. Several new Cocoon components utilizes this API to provide "Searching XML in Cocoon". There are two services provided by these components: Indexing Searching Indexing is realized by crawling starting from a base URI, and generating a lucene index. Searching uses the generated lucene index. The index is searched for a requested query. The crawling component is packed in org.apache.cocoon.components.crawler. Indexing and searching is packed in org.apache.cocoon.components.search. A Cocoon generator using the searching components is packaged in org.apache.cocoon.generation. A GUI for searching is implemented by using XSP, and as a generator. Both implementions can be used independtly. Description As having an existing index is a precondition for searching, the description of crawling and indexing is described first; a description of the searching follows. The crawling component provides all links of requested URI. The links of a URI are requested by using the Cocoon feature of views. A URI which is allowed to get crawled, must provide a view. By default the crawling component requests the view links. A link view must provide a response of content type application/x-cocoon-links. Using a serializer type links having src org.apache.cocoon.serialization.LinkSerializer will guarentee the correct content type. The indexing component crawls in-depth, starting from a given base URI. The indexing component uses a crawler component to receive all links of a page. The indexing component filters the response of a crawler. Filtering asserts following conditions: Index only resources which have not been indexed already. Index only resources which are indexable, like documents, ignore images, non-xml documents. Indexing parses an XML document, and produces a lucene document. A lucene document may have serval fields, which acts like columns of a database table. Indexing writes the lucene index into a directory, by default the Cocoon working directory is used. Moreover a lucene analyzer, and the lucene writing mode must be defined. The searching components uses a created lucene index. The index may be created by any lucene indexer. The searching component must have access to an index directory, and it should use the same lucene analyzer as the indexer at creation time of the index directory. The searching component returns all hits of a search, the XSP, and the generator filters the hits for a all hits displayed on a page. The search generator searches the lucene index by using the searching components, and generates XML content. As sample of the XML content produced by the search generator: <?xml version="1.0" encoding="UTF-8"?> <search:results date="1008437081064" query-string="cocoon" start-index="0" page-length="10" xmlns:search="http://apache.org/cocoon/search/1.0" xmlns:xlink="http://www.w3.org/1999/xlink"> <search:hits total-count="125" count-of-pages="13"> <search:hit rank="0" score="1.0" uri="http://localhost:8080/cocoon/documents/hosting.html"/> <search:hit rank="1" score="1.0" uri="http://localhost:8080/cocoon/documents/hosting.html"/> <search:hit rank="2" score="1.0" uri="http://localhost:8080/cocoon/documents/hosting.html"/> <search:hit rank="3" score="0.93121004" uri="http://localhost:8080/cocoon/documents/userdocs/actions/actions.html"/> <search:hit rank="4" score="0.93121004" uri="http://localhost:8080/cocoon/documents/userdocs/actions/actions.html"/> <search:hit rank="5" score="0.7112235" uri="http://localhost:8080/cocoon/documents/mail-archives.html"/> <search:hit rank="6" score="0.70967746" uri="http://localhost:8080/cocoon/documents/userdocs/serializers/link-serializer.html"/> <search:hit rank="7" score="0.6881721" uri="http://localhost:8080/cocoon/documents/userdocs/serializers/text-serializer.html"/> <search:hit rank="8" score="0.6881721" uri="http://localhost:8080/cocoon/documents/userdocs/serializers/vrml-serializer.html"/> <search:hit rank="9" score="0.6666666" uri="http://localhost:8080/cocoon/documents/userdocs/serializers/svgpng-serializer.html"/> </search:hits> <search:navigation total-count="125" count-of-pages="13" has-next="true" has-previous="false" next-index="10" previous-index="0"> <search:navigation-page start-index="0"/> <search:navigation-page start-index="10"/> <search:navigation-page start-index="20"/> <search:navigation-page start-index="30"/> <search:navigation-page start-index="40"/> <search:navigation-page start-index="50"/> <search:navigation-page start-index="60"/> <search:navigation-page start-index="70"/> <search:navigation-page start-index="80"/> <search:navigation-page start-index="90"/> <search:navigation-page start-index="100"/> <search:navigation-page start-index="110"/> <search:navigation-page start-index="120"/> </search:navigation> </search:results> The navigation elements is for easy handling of navigation issues, in a xslt. Bill Of Material: New packages: org.apache.cocoon.components.crawler, org.apache.cocoon.components.search New avalon components: org.apache.cocoon.components.crawler.CocoonCrawler org.apache.cocoon.components.crawler.SimpleCocoonCrawlerImpl: external http crawler for Cocoon. This crawler generates a list of links received from a URI request, enhancing it with a cocoon-view query. org.apache.cocoon.components.IndexHelperField org.apache.cocoon.components.LuceneCocoonHelper org.apache.cocoon.components.LuceneCocoonIndexer org.apache.cocoon.components.LuceneCocoonPager org.apache.cocoon.components.LuceneCocoonSearcher org.apache.cocoon.components.LuceneIndexContentHandler org.apache.cocoon.components.LuceneXMLIndexer org.apache.cocoon.components.SimpleLuceneCocoonIndexerImpl org.apache.cocoon.components.SimpleLuceneCocoonSearcherImpl org.apache.cocoon.components.SimpleLuceneXMLIndexerImpl New sitemap components: org.apache.cocoon.generation.SearchGenerator New JUnit testcase: org.apache.cocoon.generation.test.SearchGeneratorTestCase New webapp resources: sitemap.xmap search-index.xsp welcome-index.xsp create-index.xsp stylesheets/search2html.xsl lucene_green_300.gif Compiling & Installing: For compiling, and at runtime, a lucene.jar is neccessary. This will need a changing the build.xml is neccessary, too, for checking availability, and modifying the webapp sitemap for includeing the search demo. Installing the the avalon components needs change of the cocoon.xconf file inserting the avalon components org.apache.cocoon.components.LuceneXMLIndexer org.apache.cocoon.components.SimpleLuceneCocoonIndexerImpl org.apache.cocoon.components.SimpleLuceneCocoonSearcherImpl org.apache.cocoon.components.SimpleLuceneXMLIndexerImpl. A sitemap, or subsitemap to be adapted for using the XSP, and the generator. bye bernhad --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, email: [EMAIL PROTECTED]