On Sat, May 4, 2013 at 7:10 PM, Christian Mollekopf <[email protected]>wrote:
> On Saturday 04 May 2013 18.49:05 Vishesh Handa wrote: > > Hey guys > > > > > I was thinking of moving all the plain text related to a file into the > > nie:plainTextContent of the resource. So in the case of music we would > have > > - > > > > <res> nie:plainTextContent "title artist album whatevereElse" . > > > > for the case of files, we would append the file name, and any other plain > > text that we want searched just in the nie:plainTextConent. So a search > for > > any combination of text will just have to search through the plain text > > content. > > > > Opinions? > > Hey Vishesh, > > I think that's a good idea. We're also already using it that way to be > able to > search through emails with markup in the email feeder, and I see no reason > why > we can't extend that to other resource types (after all the property is > exactly for this purpose). > So that means, in the future all feeders should push all information which > should be matched by full text searching to nie:plainTextContent, right? > I was actually thinking of adding a separate API for the text which is streamed instead of the current load everything in memory and push it. The File Indexers already have a function like that. > > The alternative would of course be to use a separate dedicated fulltext > index, > which may have better performance, some more features (tokenizer, stemming > etc.), but would obviously complicate the setup again (fulltext query => > i.e. > filter by type in nepomuk => retrieve akonadi item). So not necessarily > the way > to go, but I wanted to bring it on the table anyways as it's IMO not > conflicting with what nepomuk provides (the semantic analysis), and could > result in better results (performance and feature wise) than letting > virtuoso > doing all the work. > I have been thinking about the same thing - we have no support for stemming or any other advanced feature we want. I'll take more about this later. I have an idea which might be very controversial. > > > > > We can easily do this for the 4.11 release cause we already need everyone > > to re-index everything cause of the migration. > > Cool. > > Cheers, > Christian > -- Vishesh Handa
_______________________________________________ Nepomuk mailing list [email protected] https://mail.kde.org/mailman/listinfo/nepomuk
