Hi, Using the avalon components might help to speed up the searching, as I changed the classes to Recyclable, and corrected a bug in the IndexReaderCache -giving me a TooManyOpenedFiles exception. As there will be a lot of clients doing search, it is important to have a fast search, hence: The indexReader is like a JdbcConnection, pooling would speed up. Only in case of the changing the index it is neccessary to recreate the indexReader.
>Why don't you throw in your skeleton ideas here and we discuss then in >the open? > Okay, perhaps i have misunderstood something. >>* I will implement some paging for the search result, if there are too >>much search result for displaying on a single page. >> > >Yep, this is a must do. > I have done this but still using the old package names. I added a LuceneCocoonPager (I know the names...) class, doing the hits per page calculation, and wrapping the Hits class. You will find it in the attachment plus the modified searchindex.xsp. If searchindex.xsp stays I'd like to have some xsp-stylesheet for doing the reendering of the paging stuff. Is there some easy way not having to declare the logicsheet in the cocoon.xconf? For the developing I'd like to declare the logicsheet inside the xsp itself. This paging stuff should go into the org.apache.cocoon.generator.SearchGenerator, too. This way the generator is able to generate only the search result which will be displayed. >>* I will study the Main class for the internal crawling.. >> > >Great > Okay, it got an overview using the environment.commandline.* classes. Now i have a question about crawling&indexing: As it is now I have a xsp to trigger the crawling&indexing. It uses http URLs to access the xml-content for indexing. Now to speed up I see following possibilities: First still staying in an servlet-context environment: * For Servlet 2.3 something like this might work: RequestDispatch rd = servletContext.getRequestDispatch( "/cocoon/documents/index.html?cocoon-view=content" ); rd.include( new_request_wrapper, new_response_wrapper ); new_response_wrapper should hold the xml-content. For Cocoon in Servlet 2.2, and higher: I want to access the Cocoon instance of the current servlet-context. I don't want to create another Cocoon instance for sake of performance, and memory-consumption. If I have to create a new Cocoon instance, I see following choices: * create an Cocoon instance like the org.apache.cocoon.Main and try to grap the right configs, etc like the servlet-engine Cocoon instance. How could I assert to get the right configs? * create an Cocoon instance simulating an servlet-environment. Can you give some hints about implementing the easiest solution. For the commandline only crawling, and indexing I see following choices: * Implement something like the org.apache.cocoon.Main for the crawling, and indexing. Same here I will grap the same config like the servlet-engine Cocoon instance. * Additional adding an Ant wrapper: <taskdef name="cocoon-index" class="org.apache.cocoon.optional.ant.CocoonIndexTask"/> <cocoon-index index-directory="/a/c/index" create="yes" analyzer="org.apache.lucene.analyzer.StandardAnalyzer" uri="index.html" contextDir="${build.context}" destDir="${build.dir}/ant-test/docs" workDir="${build.dir}/ant-test/work" logLevel="INFO"> </cocoon-index> * Now should there be some Cocoon Ant datatype for making it more easy to create an Cocoon instance? like: <cocoon-index index-directory="/a/c/index" create="yes" analyzer="org.apache.lucene.analyzer.StandardAnalyzer" uri="index.html"> <cocoon contextDir="${build.context}" destDir="${build.dir}/ant-test/docs" workDir="${build.dir}/ant-test/work" logLevel="INFO"/> </cocoon-index> * Apropos Ant wrapper I was implementing an Ant wrapper for the Main class by extending the Ant class Java, and it works fine, calling the Main.main() from a forked java. Thus creating the cocoon documents: ... <taskdef name="cocoon" classname="org.apache.cocoon.optional.ant.CocoonJavaTask"> <classpath> <path refid="classpath"/> </classpath> </taskdef> <cocoon contextDir="${build.context}" destDir="${build.dir}/ant-test/docs" workDir="${build.dir}/ant-test/work" logLevel="INFO" uri="index.html" > <classpath> <path refid="classpath"/> </classpath> </cocoon> ... But I failed to call it setting fork=false, getting some ClassNotFoundException. Now I wonder the ServletEngine has solved this somehow.... * Having a command line, or Ant wrapped indexing, and crawling the last open issues is to invoke that via some time-service, some ApplicationServer like WLS offers that, and I think there is some Cron-Service in the Avalon-System. Does it makes sense to add the Avalon-Cron service into a simple Servlet-Engine? >searching for 'cocoon' would result in something like: > > <search:results> > <search:hit rank="1" score="89%" uri="..."> > <xhtml:p> > <search:highlight>Cocoon</search:highlight> now offers semantic ><search:highlight>search</search:highlight> > </xhtml:p> > </search:hit> > ... > </search:results> > >As you can see, this also includes part of the "context" where the >textual information is found. This follows the Google model and I think >it would be a *great* feature to have. > This is possible if you change the lucene API a bit. There was some posting in lucene mailing list regarding highlightning. I don't know about the state of that approvement. Anyway the highlightning needs some changes in the lucene API, i have modified "my" lucene to be able to do highlightning. Moreover if you want to have something like highligthning, the question is if the summary should be stored in the index, too, or should we ask for the cocoon-view again, at search-time, to get the summary? I have implemented the LuceneIndexContentHandler to generate no-store fields, body, and all the element, and attribute fields are not stored only indexed fields, Now adding a summary might make it worth to add the body field as stored. But what about the <s1 title="Introdcution">? The "Introduction" is not stored in the body. How should we summarize this? > >But this requires more thinking, I'd say let's ignore it for now, so you >can come up with > > <search:results xmlns:search="http://apache.org/cocoon/search/1.0> > <search:hit rank="1" score="89%" uri="..."/> > ... > </search:results> > >which is good enough for now but could be easily improved later on. > bye bernhard
lucene.zip
Description: Zip compressed data
lucene-xsp.zip
Description: Zip compressed data
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, email: [EMAIL PROTECTED]