Bernhard Huber wrote:
>
> Hi,
>
> Using the avalon components might help to speed up the searching, as I
> changed the classes to Recyclable,
> and corrected a bug in the IndexReaderCache -giving me a
> TooManyOpenedFiles exception.
> As there will be a lot of clients doing search, it is important to have
> a fast search, hence:
> The indexReader is like a JdbcConnection, pooling would speed up. Only
> in case of the changing the index it
> is neccessary to recreate the indexReader.
Good point.
> >Why don't you throw in your skeleton ideas here and we discuss then in
> >the open?
> >
> Okay, perhaps i have misunderstood something.
>
> >>* I will implement some paging for the search result, if there are too
> >>much search result for displaying on a single page.
> >>
> >
> >Yep, this is a must do.
> >
> I have done this but still using the old package names.
>
> I added a LuceneCocoonPager (I know the names...) class, doing the hits
> per page calculation, and wrapping the Hits class. You will find it in
> the attachment plus the modified searchindex.xsp.
>
> If searchindex.xsp stays I'd like to have some xsp-stylesheet for doing
> the reendering of the paging stuff.
> Is there some easy way not having to declare the logicsheet in the
> cocoon.xconf?
not that I know of.
> For the developing I'd like
> to declare the logicsheet inside the xsp itself.
don't think it's possible on the current system.
> This paging stuff should go into the
>
> org.apache.cocoon.generator.SearchGenerator, too.
> This way the generator is able to generate only the search result which will be
>displayed.
I agree.
> >>* I will study the Main class for the internal crawling..
> >>
> >
> >Great
> >
> Okay, it got an overview using the environment.commandline.* classes.
> Now i have a question about crawling&indexing:
>
> As it is now I have a xsp to trigger the crawling&indexing. It uses http
> URLs to access the xml-content for indexing.
> Now to speed up I see following possibilities:
>
> First still staying in an servlet-context environment:
> * For Servlet 2.3 something like this might work:
> RequestDispatch rd = servletContext.getRequestDispatch(
> "/cocoon/documents/index.html?cocoon-view=content" );
> rd.include( new_request_wrapper, new_response_wrapper );
> new_response_wrapper should hold the xml-content.
>
> For Cocoon in Servlet 2.2, and higher:
> I want to access the Cocoon instance of the current servlet-context. I
> don't want to create another
> Cocoon instance for sake of performance, and memory-consumption.
>
> If I have to create a new Cocoon instance, I see following choices:
>
> * create an Cocoon instance like the org.apache.cocoon.Main and try to
> grap the right configs, etc like the servlet-engine Cocoon instance. How
> could I assert to get the right configs?
> * create an Cocoon instance simulating an servlet-environment.
> Can you give some hints about implementing the easiest solution.
Cocoon is an avalon component.
My best choice would be to retrieve Cocoon as a component directly from
the ComponentManager, then call the process(Environment) method
indicating what environment we want, just like the Main class does.
> For the commandline only crawling, and indexing I see following choices:
> * Implement something like the org.apache.cocoon.Main for the crawling,
> and indexing. Same here I will
> grap the same config like the servlet-engine Cocoon instance.
> * Additional adding an Ant wrapper:
> <taskdef name="cocoon-index"
> class="org.apache.cocoon.optional.ant.CocoonIndexTask"/>
> <cocoon-index
> index-directory="/a/c/index"
> create="yes"
> analyzer="org.apache.lucene.analyzer.StandardAnalyzer"
> uri="index.html"
> contextDir="${build.context}"
> destDir="${build.dir}/ant-test/docs"
> workDir="${build.dir}/ant-test/work"
> logLevel="INFO">
> </cocoon-index>
> * Now should there be some Cocoon Ant datatype for making it more easy
> to create an Cocoon instance? like:
> <cocoon-index
> index-directory="/a/c/index"
> create="yes"
> analyzer="org.apache.lucene.analyzer.StandardAnalyzer"
> uri="index.html">
> <cocoon
> contextDir="${build.context}"
> destDir="${build.dir}/ant-test/docs"
> workDir="${build.dir}/ant-test/work"
> logLevel="INFO"/>
> </cocoon-index>
hmmm, might connect Ant to Cocoon too strongly but I really don't know.
What do others think about this?
> * Apropos Ant wrapper I was implementing an Ant wrapper for the Main
> class by extending the Ant class Java, and it works fine, calling the
> Main.main() from a forked java.
> Thus creating the cocoon documents:
> ...
> <taskdef name="cocoon"
> classname="org.apache.cocoon.optional.ant.CocoonJavaTask">
> <classpath>
> <path refid="classpath"/>
> </classpath>
> </taskdef>
>
> <cocoon
> contextDir="${build.context}"
> destDir="${build.dir}/ant-test/docs"
> workDir="${build.dir}/ant-test/work"
> logLevel="INFO"
> uri="index.html"
> >
> <classpath>
> <path refid="classpath"/>
> </classpath>
> </cocoon>
> ...
> But I failed to call it setting fork=false, getting some
> ClassNotFoundException. Now I wonder the ServletEngine has solved this
> somehow....
Sounds like a classloading containment problem. Ant is not as advanced
on classloading like Tomcat is.
> * Having a command line, or Ant wrapped indexing, and crawling the last
> open issues is to invoke that via some time-service, some
> ApplicationServer like WLS offers that, and I think there is some
> Cron-Service in the Avalon-System. Does it makes sense to add the
> Avalon-Cron service into a simple Servlet-Engine?
I think so.
>
> >searching for 'cocoon' would result in something like:
> >
> > <search:results>
> > <search:hit rank="1" score="89%" uri="...">
> > <xhtml:p>
> > <search:highlight>Cocoon</search:highlight> now offers semantic
> ><search:highlight>search</search:highlight>
> > </xhtml:p>
> > </search:hit>
> > ...
> > </search:results>
> >
> >As you can see, this also includes part of the "context" where the
> >textual information is found. This follows the Google model and I think
> >it would be a *great* feature to have.
> >
> This is possible if you change the lucene API a bit.
> There was some posting in lucene mailing list regarding highlightning. I
> don't know about the state of that approvement. Anyway the highlightning
> needs some changes in the lucene API, i have modified "my"
> lucene to be able to do highlightning.
Hmmm, forking lucene is not exactly a good way of working with them. I'd
suggest you to send the patches to them and see what comes up from
there.
I would be against having a ad-hoc modified version of Lucene into our
CVS.
> Moreover if you want to have something like highligthning, the question
> is if the summary should be stored in the
> index, too, or should we ask for the cocoon-view again, at search-time,
> to get the summary?
Right, I was thinking the same thing.
Performance-wise, the obvious answer is to store the summary along with
the index.
> I have implemented the LuceneIndexContentHandler to generate no-store
> fields, body, and all the element, and attribute fields are not stored
> only indexed fields,
> Now adding a summary might make it worth to add the body field as
> stored. But what about the
> <s1 title="Introdcution">? The "Introduction" is not stored in the body.
> How should we summarize this?
Attributes can appear only once, what about wrapping them with square
brakets?
[Introduction] This text is something that blah
blah blah [How to blah blah] blah blah blah
but I'm wide open to suggestions here.
--
Stefano Mazzocchi One must still have chaos in oneself to be
able to give birth to a dancing star.
<[EMAIL PROTECTED]> Friedrich Nietzsche
--------------------------------------------------------------------
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, email: [EMAIL PROTECTED]