Johannes D. wrote:
>
> I have looked more at the solutions you suggested and at the source files
> for whc. I think I have found an easier solution to our problem than to
> integrate stemming.
>
> Would it be possible to modify the search so when searching for eg. "PDF"
> the results would also include
> PDFs
> PDF's
> PDF-files
> PDF-documents
> PDF-template
> PDF-settings ect.
>
> Another example would be servlet. In the current version of whc the
> results would not include servlets, but only servlet.

We know this limitation. We estimate it is acceptable given the fact 
that [a] whc targets technical documents [b] whc also supports an actual 
index (i.e. based on <indexterm>).



>
> We have several places in our documentation where the search results are
> less useful because the search function only searches for the exact words.
>
> If this modification is possible it would make up for the lack of stemming
> in our setup.
>
> Lastly I apologize for the mixup in the first mail where I called stemming
> for spanning.

You are more or less describing solution [1] (see below), which is 
deemed to be a quick and dirty hack. As I've already said it in my 
previous email, we do not want to implement solution [1]. Sorry for our 
inflexibility, but I'm afraid you'll have to implement what you want 
yourself.




[email protected] wrote:
> There are 2 ways to implement this:
>
> [1] Using a quick and dirty hack (search substrings in the current word
> index). This will often work but even more often, this will degrade the
> usability of the current search facility of ditac's webhelp.
>
> [2] Support *stemming* for a number of languages. Example: "Danish
> stemming algorithm" --
> http://snowball.tartarus.org/algorithms/danish/stemmer.html
>
> Stemming -- http://en.wikipedia.org/wiki/Stemming -- is a mandatory
> feature of any general purpose full-text search engine.
>
> We could have implemented stemming for a limited number of languages in
> ditac's webhelp. However, we have decided not to do so. Rationale:
> ditac's webhelp is expected to be used for technical documents (e.g.
> reference manuals). In this kind of documents, you generally search for
> exact words: PDF, FOP, Apache, Tomcat, servlet, configuration, manifest,
> -verbose, etc. In this case, stemming is not strictly needed.

 
--
XMLmind DITA Converter Support List
[email protected]
http://www.xmlmind.com/mailman/listinfo/ditac-support

Reply via email to