Johannes D. wrote: > > I have looked more at the solutions you suggested and at the source files > for whc. I think I have found an easier solution to our problem than to > integrate stemming. > > Would it be possible to modify the search so when searching for eg. "PDF" > the results would also include > PDFs > PDF's > PDF-files > PDF-documents > PDF-template > PDF-settings ect. > > Another example would be servlet. In the current version of whc the > results would not include servlets, but only servlet.
We know this limitation. We estimate it is acceptable given the fact that [a] whc targets technical documents [b] whc also supports an actual index (i.e. based on <indexterm>). > > We have several places in our documentation where the search results are > less useful because the search function only searches for the exact words. > > If this modification is possible it would make up for the lack of stemming > in our setup. > > Lastly I apologize for the mixup in the first mail where I called stemming > for spanning. You are more or less describing solution [1] (see below), which is deemed to be a quick and dirty hack. As I've already said it in my previous email, we do not want to implement solution [1]. Sorry for our inflexibility, but I'm afraid you'll have to implement what you want yourself. [email protected] wrote: > There are 2 ways to implement this: > > [1] Using a quick and dirty hack (search substrings in the current word > index). This will often work but even more often, this will degrade the > usability of the current search facility of ditac's webhelp. > > [2] Support *stemming* for a number of languages. Example: "Danish > stemming algorithm" -- > http://snowball.tartarus.org/algorithms/danish/stemmer.html > > Stemming -- http://en.wikipedia.org/wiki/Stemming -- is a mandatory > feature of any general purpose full-text search engine. > > We could have implemented stemming for a limited number of languages in > ditac's webhelp. However, we have decided not to do so. Rationale: > ditac's webhelp is expected to be used for technical documents (e.g. > reference manuals). In this kind of documents, you generally search for > exact words: PDF, FOP, Apache, Tomcat, servlet, configuration, manifest, > -verbose, etc. In this case, stemming is not strictly needed. -- XMLmind DITA Converter Support List [email protected] http://www.xmlmind.com/mailman/listinfo/ditac-support

