> >> 4) This python/perl is needed because > >> some of my e-mails are html-entity, quoted-printable, 8-bit, > >> iso-8859-2, utf-8 encoded and so on. > >> I guess this is not a problem for Beagle at all, ie it can search > >> in any e-mail no mattter how it is encoded. > >> What about the attached files? > > > > If you throw email files (containing single emails, maildir style) to > > beagle, it knows how to index emails. Also beagle will take care of > > the attachments itself. Sometimes there is a problem in determining > > the mimetype of email files, instead of message/rfc822 they are > > recognized as text files by our mime type sniffer. So, if you can > > somehow ensure that the files that are sent to beagle have the > > mimetype explicitly set to message/rfc822, beagle will correctly index > > them for you. > > Interesting: does this happen implicitly within the file crawler > (i.e. not the other backends), or explicitly via the other backends (say > the KMailQueryable)? Because if it's implicit, might as well back out > whatever I'm doing for the Gnus backend and work on MIME type > detection. :)
[Long email warning] (There is a wiki page which might be helpful http://beagle-project.org/Architecture_Overview) The tedious work of "extracting data" from the physical files (or embedded files in files like attachments) is done by the drones aka Filters. The work of "finding data" to index from who_knows_which_weird_location and maintaining state of data of some application (e.g. which mails are deleted in mbox-based Mail apps, deleted emails are not immediately deleted from the disk) is done by smart agents aka Backends. Though possible, rarely any Backend extracts indexable data from any file. They merely set up a request to the filters to index a physical file (as you have done in the Gnus backend). So, there is a generic Mail filter which can index all message/rfc822 emails. This is used by the Files backend to index any email messages it finds on the disk and by all the mail backends. The reason specialized mail backends exist is because they want to maintain/index some additional information not handled by the generic files backend. Typically the mail backends use some app some information. So, the files backend can perfectly index any email files. There are three general problems with this: 1) Sometime mimetype detection misses a valid mail file and marks it as text. We use xdgmime mimetype detection and the issue arises because some mail clients add a different than expected first line in the mail message. xdgmime checks the first line to detect mail message. 2) Once an email is found as a search result, what to do when a user clicks on it ? If the emails come from kmail or evolution or t-bird backends, beagle-search knows which application to open. For emails on the disk, I am not sure what to do. 3) Sometimes additional email-client specific information could be useful information. E.g. in the gnus backend you are writing, the information about the folder name is something that the files backend will never be able to give you. Also, if you are able to parse the .overview files (or whatever other gnus specific state files), generally they contain useful information which is useful to report to the end-user. The files backend will never be able to report such information. In a sense the mail backend is a specialized files backend. I would still encourage you to work on a gnus backend. But if you want a quick working solution, you can just use the files backend to index those email files. In case beagle is not able to detect the mimetypes properly, you can get away with writing an empty FilterGnus, subclassed from FilterMail, with AddSupportedFlavor (new FilterFlavor ("file:///path/to/~/Mail/*", null, null,1)); This will force all files in ~/Mail/* to be indexed using FilterGnus and thus use FilterMail. Warning: If there is any non mail file in ~/Mail, the Mail filter might crash! You have been warned. - dBera -- ----------------------------------------------------- Debajyoti Bera @ http://dtecht.blogspot.com beagle / KDE fan Mandriva / Inspiron-1100 user _______________________________________________ Dashboard-hackers mailing list [email protected] http://mail.gnome.org/mailman/listinfo/dashboard-hackers
