Hi all, Today I was demo-ing KDE at the Systems in Munich and the GNOME presenter across from me in the GNOME booth told me about this discussion [1]. Since I'm the developer of Strigi [2] it interested me and I would love to contribute to this discussion. Also, I believe this discussion is of interest to the KDE developers, since KDE is also in need of good desktop search tools. Therefor, this mail also goes to kde-core-devel.
First off, let me say that I'll be going slightly off topic by not only discussing inclusion of search engines into GNOME but also cooperation between the current alternatives. Both of these aspects have been talked about in this thread and I'd like to add to it from the point of view of yet another desktop search tool. But first let me introduce Strigi. Strigi is a desktop search tool that has many similarities and difference to Beagle and Tracker and which originates from the unfortunate demise of Kat. The goal of Strigi is quite clear: index user data so that searching for it is fast. The aim is not to index only plain text but also metadata so that a user may search for e.g. 'ext:png width:128' to find all files with a width of 128 pixels Strigi has a few features that are not in Tracker or Beagle and misses a number of features that the other programs lack. But the core functionality of Strigi, indexing data, is something that it shares. One important distinction has to be made straightaway: the difference between indexing metadata and storing metadata. Strigi only indexes metadata. If you think you're disk is full, you can just throw away the index, because there is no data of value in there. All that's in there is an index that allows you to find your data quickly. Personally, I think _storing_ metadata in an indexer is not a good idea. (I do think that an index on a metadata store is a good idea, but that's a different matter). This is a large difference with Tracker which does act as a metadata store of 'first class objects' whatever that means. Beagle is also mainly an index. (Is any non-redundant data lost if I delete my Beagle index, Joe?) So if Tracker and Beagle also index data, what's so special about Strigi? (sorry for the obligatory boasting coming up) - It is KISSest of all - It is fastest of all (for indexing many small files, just parsing is ~100 docs per second, with writing to the index depends on the index backend) - It can index files in files in files in files in files - It has and indexer that can output XML and can this be used by other indexers (Beagle and Tracker) so that indexing code can be shared. Having a common metadata standard would be nice for this purpose, but see below) - It is written in C++ - It has multiple storage backends clearly separated behind an API so that Strigi can always take advantage of the fastest index around (currently clucene) - It can be used for searching even if there is no index, by using the command line programs 'deepfind' and 'deepgrep' [2] This is however not a sales talk. Strigi stands on it's own. It's GUI independent. Currently, it links to clucene or hyperestraier, to libexpat and some other common libs like libz and libcrypto. It has a DBus interface and can be called from any language with DBus support. There's a plugin for GNOME Deskbar in the source code. So it this is not a sales talk, what is it? It's a call for standardization. This discussion between competing programs is a great time to start talking about common functionality. With regards to desktop search there are many things that can be standardized: - query language - metadata names and meaning - test suites - DBus APIs - index formats I won't discuss index formats because, even though Beagle and Strigi both use the Lucene index format, this is an implementation detail and defines performance and disk usage and should not be frozen into a standard. The query language as used by Beagle and Strigi is very similar (no coincidence) and is a good start for standardization. The largest drawback of the language used is the ambiguity of the field specifiers. Now that DBus v1 is almost upon is, the barriers between GNOME and KDE are diminishing. Functionality defined by a DBus API can by implemented in any language and as such, I think GNOME should choose a DBus API to use and share with KDE and Test suites. I'd love there to be a common test suite that says: if you index this data with these parameters, you should get these results from this query. Strigi will develop such test naturally. Being able to share them across projects would mean that programs would compete on merit and without the usual prejudices and license and library incompatibilities. Strigi has a DBus interface for searching, so does Tracker. We should compare them and find a common interface. Of course the respective GNOME and KDE developers should decide which DBus API should be used by their applications. Freedesktop.org would be a good place to define these interfaces. Metadata naming and meaning. This is something which is rather hard. Dublin Core is part of it. It names some types of metadata. I've already mailed about this with Jamie in the past . In my opionion, the issue should be separated into smaller definitions that say, what metadata fields can be extracted from certain filetypes. Indexer plugins could then advertise that they implement this functionality. The names for the metadata names should also be used when searching and there, for convenience, they should be abbreviated as is current practice. So, rather a long mail that can be summarized in: please accept an API for searching and not a suit of programs (indexer + guis to it) and start thinking about standardizing _indexable_ metadata (other metadata is a whole different can of worms that I wont touch). This is still possible since neither KDE nor GNOME have agreed on a program for indexing and by adopting only an API, programs will be forced to collaborate to adhere to the API as good as possible, meaning the user wins. Cheers, Jos [1] http://mail.gnome.org/archives/desktop-devel-list/2006-October/msg00175.html [2] http://www.vandenoever.info/software/strigi/ [3] http://www.kdedevelopers.org/node/2468 _______________________________________________ desktop-devel-list mailing list [email protected] http://mail.gnome.org/mailman/listinfo/desktop-devel-list
