Re: [Wasabi] Kicking of the Metadata spec - brainstorm
Le mardi 20 février 2007, à 23:08, Mikkel Kamstrup Erlandsen a écrit : 2007/2/20, Vincent Untz [EMAIL PROTECTED]: Le lundi 19 février 2007, à 23:35, Mikkel Kamstrup Erlandsen a écrit : Let's get the ball rolling on the metadata spec. This first period will just be *brainstorming*, so let's try and avoid the nitty gritty details for now. Hi Mikkel, Just wondering: would it make sense to move the wasabi discussion to a wasabi mailing list? Having some kind of status reports send to xdg would be interesting, but I must admit having the whole discussion is a bit too much (at least for me :-)). Would be fine by me. It might make things easier for everybody. It shouldn't be a problem sending interesting summaries/notifications to xdg. So if you have the power, go ahead :-) I don't have the power, but you can request a mailing list by opening a bug: https://bugs.freedesktop.org/enter_bug.cgi?product=freedesktop.org (mailing lists component) Thanks, Vincent -- Les gens heureux ne sont pas pressés. ___ xdg mailing list xdg@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/xdg
Re: [Wasabi] Kicking of the Metadata spec - brainstorm
Hi, [trimmed CC line] Mikkel Kamstrup Erlandsen wrote: How would you manage locks on the db? Assuming sqlite, you just let it handle them. You have to be intelligent in your use of transactions, but I don't see how a daemon inherently improves lock contention. Was synchronization issues not the reason why leaftag wasn't successful? I mean if you can't keep file tags synchronized then you are unlikely to succeed in keeping general metadata up to date... I think that's one reason why it wasn't successful, but more importantly it wasn't pushed by the developers and was ultimately abandoned by them. Synchronization becomes much less of an issue if you're able to attach the metadata to the files themselves. Joe ___ xdg mailing list xdg@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/xdg
Re: [Wasabi] Kicking of the Metadata spec - brainstorm
Hi, jamie wrote: (3) is the safest and best way - you get a richer service that allows all apps to share metadata and stay up to date. (2) is great in theory but falls over in practice for loads of reasons (mainly need for fallbacks). XMP sidecar is better and really just needs GVFS and KIO slaves to support it transparently (IE to auto copy the hidden sidecar file whenever you copy/move a file) (1) fails for same reasons (2) does - lack of compatibility means having to have fallbacks in place for file types that dont have embedded metadata. (if that fallback is going to be XMP sidecar then why bother with anything else?) Sorry, I must not have been clear in my previous email. The entire point is to have fallbacks, because it gives you the richest experience. You try (1) [inline metdata] and if it fails or doesn't make sense, you try (2) [extended attributes]. Only then if it fails do you do (3). XMP sidecars are a possibility (probably as step 2.5), but they have the same problem as a centralized database: they don't follow the file around. If you copy a file, rename a file, or send it as an email attachment, if you don't know to do the same with the sidecar you'll lose it. gnome-vfs/gvfs/kio support for this helps the issue, but it's not as tightly coupled as xattrs. And you say that (1) and (2) fail in practice, but this just isn't true. F-Spot stores its tags as XMP data in image files quite successfully. Rhythmbox, sound-juicer, and Banshee have been putting id3 tags in MP3s forever. As for xattrs, we've been using them very successfully in Beagle for over 3 years (with a transparent fallback to an sqlite database when it's not possible). So all of this is possible to do and give the user a good experience. Also there is notifications which you need a daemon for - its why gconf is a daemon! Notifications are tricky, but I think it might be possible to do them within a library using inotify (or FAM or other appropriate method). gconf is a daemon because (a) mechanisms like these didn't exist and (b) because absolutely everything touches it, and it doesn't make sense to continually reparse its XML files. With approaches (1) and (2), notification is trivial. You get WRITE_CLOSE event and ATTR_CHANGED events on the files themselves. For (3), you can watch the database file for changes. (This means you have to timestamp rows, but once that's done you can trivially SELECT for changes newer than a certain time. This is a good thing to do in any case.) Joe ___ xdg mailing list xdg@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/xdg
Re: shared wasabi implementation
Hey, Mikkel Kamstrup Erlandsen wrote: I think I mentioned getters and setters for metadata somewhere in this thread, but that's not entirely obvious from the context of the mail you reply to :-) The methods I suggested was only for ingesting in the index. Or are you saying that there should also be a way to set metadata on the indexer and not only via the metadata api? Maybe, yeah. I wasn't thinking about the metadata spec when I wrote this; I'm still not convinced that we should have metadata getter/setter APIs as part of the spec if we're just focusing on the search side of things. The possible API I was referring to would be so that additional properties can be set on an already-indexed document. To give an example: Beagle adds Nautilus metadata to files by parsing the XML files while the file system is crawled. It would be much cleaner (and extensible) to have a separate backend which itself crawled Nautilus metadata and added properties to already-indexed files in the database. The mailbox was mainly meant as an example, but let's stick to it anyway. What I meant to say was that you can index the mailbox at any time, but the is no need to keep a watch out for changes since the only app that is ever going to change it is (say) Evolution. In a metadata aware desktop Evolution would be emitting signals (or whatever) when the mailbox changes (perhaps with hints as to what changed). Sure, this is the ideal situation, but it means that you need to have internal crawling code and an external handler to deal with changes as they come in. The bulk of the code (and the hard part, really) is usually the crawler. Joe ___ xdg mailing list xdg@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/xdg
Re: [Wasabi] Kicking of the Metadata spec - brainstorm
On Wed, 2007-02-21 at 08:54 -0500, Joe Shaw wrote: Hi, jamie wrote: (3) is the safest and best way - you get a richer service that allows all apps to share metadata and stay up to date. (2) is great in theory but falls over in practice for loads of reasons (mainly need for fallbacks). XMP sidecar is better and really just needs GVFS and KIO slaves to support it transparently (IE to auto copy the hidden sidecar file whenever you copy/move a file) (1) fails for same reasons (2) does - lack of compatibility means having to have fallbacks in place for file types that dont have embedded metadata. (if that fallback is going to be XMP sidecar then why bother with anything else?) Sorry, I must not have been clear in my previous email. The entire point is to have fallbacks, because it gives you the richest experience. You try (1) [inline metdata] and if it fails or doesn't make sense, you try (2) [extended attributes]. Only then if it fails do you do (3). for an app that wants to fetch metadata it seems to be a lot of hoops to jump through. What would be fine is perhaps (1) + (3) and (2) + (3) so that an app can easily get the metadata regardless of where its stored (via (3)) though I prefer XMP sidecar + (3) with (3) backing up all user defined metadata there that way an app has a single gateway to all metadata. XMP sidecars are a possibility (probably as step 2.5), but they have the same problem as a centralized database: they don't follow the file around. If you copy a file, rename a file, or send it as an email attachment, if you don't know to do the same with the sidecar you'll lose it. gnome-vfs/gvfs/kio support for this helps the issue, but it's not as tightly coupled as xattrs. I find it the other way round : xattrs fails as soon as you try and copy them off the volume. They simply dont work on the network, in attachments, to floppy/CD or USB stick etc. XMP sidecar seems to be the best choice here (if we have vfs support for them) And you say that (1) and (2) fail in practice, but this just isn't true. F-Spot stores its tags as XMP data in image files quite successfully. Rhythmbox, sound-juicer, and Banshee have been putting id3 tags in MP3s forever. As for xattrs, we've been using them very successfully in Beagle for over 3 years (with a transparent fallback to an sqlite database when it's not possible). So all of this is possible to do and give the user a good experience. yes with certain file types - my point is its not a *universal* solution. Also there is notifications which you need a daemon for - its why gconf is a daemon! Notifications are tricky, but I think it might be possible to do them within a library using inotify (or FAM or other appropriate method). gconf is a daemon because (a) mechanisms like these didn't exist and (b) because absolutely everything touches it, and it doesn't make sense to continually reparse its XML files. not really a notification should pass the new value for the specific metadata in the signal so apps dont need to work out which metadata has changed or force them to refresh everything. I dont think thats possible without a daemon? (you would need a copy of the old metadata to work out whats changed) (KConfig has failed to implement notifications for exactly that reason) With approaches (1) and (2), notification is trivial. You get WRITE_CLOSE event and ATTR_CHANGED events on the files themselves. For (3), you can watch the database file for changes. (This means you have to timestamp rows, but once that's done you can trivially SELECT for changes newer than a certain time. This is a good thing to do in any case.) (3) would potentially be an enormous db - it wont be practical to do that and as above places too much burden on the apps (again you need the old metadata to work out whats changed). Its kind of going to extreme lengths (and pain) to avoid having a daemon to me but then thats just my opinion! the other reason for a daemon is its safer on broken nfs if only one process ever writes to the db at a time (that of course wont work if a user has multiple sessions with a single shared nfs home directory between them but people who do that on broken nfs are asking for trouble!). jamie ___ xdg mailing list xdg@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/xdg
Re: [Wasabi] Kicking of the Metadata spec - brainstorm
With approaches (1) and (2), notification is trivial. You get WRITE_CLOSE event and ATTR_CHANGED events on the files themselves. For (3), you can watch the database file for changes. (This means you have to timestamp rows, but once that's done you can trivially SELECT for changes newer than a certain time. This is a good thing to do in any case.) (3) would potentially be an enormous db - it wont be practical to do that and as above places too much burden on the apps (again you need the old metadata to work out whats changed). Its kind of going to extreme lengths (and pain) to avoid having a daemon to me but then thats just my opinion! let me give you a good example of why this is suboptimal suppose : 1) nautilus is listening for all metadata changes in files in folder X only 2) F-spot is listening for metadata changes to photo files only 3) RB is is listening for metadata changes to music files only lets suppose (2) updates a tag, (1) and (2) would have to re-query the DB using timestamps and get back a tag. They would both have to determine if this tag is relevant to them and then hence do all the work. Also those apps would be continually awoken whenever anything changes (bad for batteries). This clearly is not a scalable solution when you start having more than a few apps. now lets look at above with a daemon: (1) would use a subscriber interface that only gets notified for metadata changes to files in folder X (2) and (3) do likewise but for photos and music this way, only the apps get awoken when necessary and they get the metadata change they need (without out any work involved). Its efficient and scalable too. ___ xdg mailing list xdg@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/xdg
Re: [Wasabi] Kicking of the Metadata spec - brainstorm
Hi, On Wed, 2007-02-21 at 14:26 +, jamie wrote: Sorry, I must not have been clear in my previous email. The entire point is to have fallbacks, because it gives you the richest experience. You try (1) [inline metdata] and if it fails or doesn't make sense, you try (2) [extended attributes]. Only then if it fails do you do (3). for an app that wants to fetch metadata it seems to be a lot of hoops to jump through. What would be fine is perhaps (1) + (3) and (2) + (3) so that an app can easily get the metadata regardless of where its stored (via (3)) You hide it all behind a single API; the consumer doesn't care where it comes from, that's the library's job. I find it the other way round : xattrs fails as soon as you try and copy them off the volume. They simply dont work on the network, in attachments, to floppy/CD or USB stick etc. XMP sidecar seems to be the best choice here (if we have vfs support for them) There's no reason why we couldn't do both. xattrs on volumes that support them, then transition to XMP sidecars if the VFS notices we're trying to copy/move them to a volume that doesn't. And you say that (1) and (2) fail in practice, but this just isn't true. F-Spot stores its tags as XMP data in image files quite successfully. Rhythmbox, sound-juicer, and Banshee have been putting id3 tags in MP3s forever. As for xattrs, we've been using them very successfully in Beagle for over 3 years (with a transparent fallback to an sqlite database when it's not possible). So all of this is possible to do and give the user a good experience. yes with certain file types - my point is its not a *universal* solution. There is no universal solution. Storing things in a database breaks down when I want to fix a typo in the artist tag of an mp3 I downloaded. not really a notification should pass the new value for the specific metadata in the signal so apps dont need to work out which metadata has changed or force them to refresh everything. I dont think thats possible without a daemon? (you would need a copy of the old metadata to work out whats changed) Passing the new value is easy, because that's what's now in the file/database. Reparsing the file might be a pain for some file formats, so I can see how a daemon would improve the caching there. (3) would potentially be an enormous db - it wont be practical to do that and as above places too much burden on the apps (again you need the old metadata to work out whats changed). Its kind of going to extreme lengths (and pain) to avoid having a daemon to me but then thats just my opinion! I don't think the size of the DB will affect things too much. If you add an index on the changed-time column, updates will be infrequent enough that the overhead at insert time will be small, and the overhead at search time will be nonexistent. I admit I haven't prototyped it though, so it might end up being untenable and a daemon unavoidable. the other reason for a daemon is its safer on broken nfs if only one process ever writes to the db at a time Yep, that's a good point. Joe ___ xdg mailing list xdg@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/xdg
Re: [Wasabi] Kicking of the Metadata spec - brainstorm
2007/2/21, Fabrice Colin [EMAIL PROTECTED]: On 2/21/07, Jos van den Oever [EMAIL PROTECTED] wrote: 2007/2/20, Joe Shaw [EMAIL PROTECTED]: Heh. Fair enough. Should we keep it simply to the search-side of things, then? The intention is to define a way to define metadata fields. This is required for searching and useful for metadata read/write too. So lets come up with the properties we allow metadata fields to have. I think what we have so far is what is required for searching too. So this will not be about an API for reading and writing. Speaking of searching metadata, would it be useful to also define which metadata fields are to be treated as boolean filters and which allow probabilistic search ? Will this not be implicit in most cases? Do you have an example? Cheers, Mikkel ___ xdg mailing list xdg@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/xdg