----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: http://git.reviewboard.kde.org/r/102356/#review5974 -----------------------------------------------------------
Commenting myself: After a discussion with Jos Vandenoever I understood that indeed each call to addText is supposed to add another fragment of text. Here a fragment is a set of words. Thus, it is up to the indexer to add white space where appropriate. - Sebastian On Aug. 17, 2011, 7:20 p.m., Sebastian Trueg wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > http://git.reviewboard.kde.org/r/102356/ > ----------------------------------------------------------- > > (Updated Aug. 17, 2011, 7:20 p.m.) > > > Review request for Nepomuk and Strigi. > > > Summary > ------- > > The problem is simple: when indexing the text from the cells in ods documents > the analyser currently simply calls addText for each cell. This results in > the backend (indexer) to concatenate all those strings which in turn means > invalid tokenization for full-text-search. > > xmlindexer and rdfindexer work around this by adding a newline after each > block of text added via addText. This, however, is clearly wrong since 1. the > API does not suggest that, 2. all other plugins - most prominently the text > analyser - do not strip away any line feeds, and 3. it would significantly > lower the power of the API to provide a line-based interface. > > Thus, the only correct approach is to take care of proper text handling in > the analysers. In this case the simplest way is to add a space after each > token. > > > Diffs > ----- > > lib/helperanalyzers/odfcontenthelperanalyzer.h 4fbfd45 > lib/helperanalyzers/odfcontenthelperanalyzer.cpp d2a0a72 > > Diff: http://git.reviewboard.kde.org/r/102356/diff > > > Testing > ------- > > Indexing an ods results in proper tokenization for cell content. Indexing an > odt results in the last word of a line not being concatenated with the first > word of the next line. > > > Thanks, > > Sebastian > >
_______________________________________________ Nepomuk mailing list [email protected] https://mail.kde.org/mailman/listinfo/nepomuk
