Re: [Wasabi] Kicking of the Metadata spec - brainstorm

2007-02-21 Thread Vincent Untz
Le mardi 20 février 2007, à 23:08, Mikkel Kamstrup Erlandsen a écrit :
2007/2/20, Vincent Untz [EMAIL PROTECTED]:
 
  Le lundi 19 février 2007, à 23:35, Mikkel Kamstrup Erlandsen a écrit :
  Let's get the ball rolling on the metadata spec. This first period
  will
  just be *brainstorming*, so let's try and avoid the nitty gritty
  details
  for now.
 
  Hi Mikkel,
 
  Just wondering: would it make sense to move the wasabi discussion to a
  wasabi mailing list? Having some kind of status reports send to xdg
  would be interesting, but I must admit having the whole discussion is a
  bit too much (at least for me :-)).
 
Would be fine by me. It might make things easier for everybody.
 
It shouldn't be a problem sending interesting summaries/notifications to
xdg.
So if you have the power, go ahead :-)

I don't have the power, but you can request a mailing list by opening a
bug:
https://bugs.freedesktop.org/enter_bug.cgi?product=freedesktop.org

(mailing lists component)

Thanks,

Vincent

-- 
Les gens heureux ne sont pas pressés.
___
xdg mailing list
xdg@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/xdg


Re: [Wasabi] Kicking of the Metadata spec - brainstorm

2007-02-21 Thread Joe Shaw

Hi,

[trimmed CC line]

Mikkel Kamstrup Erlandsen wrote:
How would you manage locks on the db? 


Assuming sqlite, you just let it handle them.  You have to be 
intelligent in your use of transactions, but I don't see how a daemon 
inherently improves lock contention.


Was synchronization issues not the reason why leaftag wasn't 
successful? I mean if you can't keep file tags synchronized then 
you are unlikely to succeed in keeping general metadata up to

date...


I think that's one reason why it wasn't successful, but more importantly 
it wasn't pushed by the developers and was ultimately abandoned by them.


Synchronization becomes much less of an issue if you're able to attach 
the metadata to the files themselves.


Joe
___
xdg mailing list
xdg@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/xdg


Re: [Wasabi] Kicking of the Metadata spec - brainstorm

2007-02-21 Thread Joe Shaw

Hi,

jamie wrote:

(3) is the safest and best way - you get a richer service that allows
all apps to share metadata and stay up to date.

(2) is great in theory but falls over in practice for loads of reasons
(mainly need for fallbacks). XMP sidecar is better and really just needs
GVFS and KIO slaves to support it transparently (IE to auto copy the
hidden sidecar file whenever you copy/move a file)

(1) fails  for same reasons (2) does - lack of compatibility means
having to have fallbacks in place for file types that dont have embedded
metadata. (if that fallback is going to be XMP sidecar then why bother
with anything else?)


Sorry, I must not have been clear in my previous email.  The entire 
point is to have fallbacks, because it gives you the richest experience. 
 You try (1) [inline metdata] and if it fails or doesn't make sense, 
you try (2) [extended attributes].  Only then if it fails do you do (3).


XMP sidecars are a possibility (probably as step 2.5), but they have the 
same problem as a centralized database: they don't follow the file 
around.  If you copy a file, rename a file, or send it as an email 
attachment, if you don't know to do the same with the sidecar you'll 
lose it.  gnome-vfs/gvfs/kio support for this helps the issue, but it's 
not as tightly coupled as xattrs.


And you say that (1) and (2) fail in practice, but this just isn't true. 
 F-Spot stores its tags as XMP data in image files quite successfully. 
 Rhythmbox, sound-juicer, and Banshee have been putting id3 tags in 
MP3s forever.  As for xattrs, we've been using them very successfully in 
Beagle for over 3 years (with a transparent fallback to an sqlite 
database when it's not possible).  So all of this is possible to do and 
give the user a good experience.


 Also there is notifications which you need a daemon for - its why
 gconf is a daemon!

Notifications are tricky, but I think it might be possible to do them 
within a library using inotify (or FAM or other appropriate method). 
gconf is a daemon because (a) mechanisms like these didn't exist and (b) 
because absolutely everything touches it, and it doesn't make sense to 
continually reparse its XML files.


With approaches (1) and (2), notification is trivial.  You get 
WRITE_CLOSE event and ATTR_CHANGED events on the files themselves.  For 
(3), you can watch the database file for changes.  (This means you have 
to timestamp rows, but once that's done you can trivially SELECT for 
changes newer than a certain time.  This is a good thing to do in any case.)


Joe
___
xdg mailing list
xdg@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/xdg


Re: shared wasabi implementation

2007-02-21 Thread Joe Shaw

Hey,

Mikkel Kamstrup Erlandsen wrote:
I think I mentioned getters and setters for metadata somewhere in this 
thread, but that's not entirely obvious from the context of the mail you 
reply to :-) The methods I suggested was only for ingesting in the 
index. Or are you saying that there should also be a way to set metadata 
on the indexer and not only via the metadata api?


Maybe, yeah.  I wasn't thinking about the metadata spec when I wrote 
this; I'm still not convinced that we should have metadata getter/setter 
APIs as part of the spec if we're just focusing on the search side of 
things.


The possible API I was referring to would be so that additional 
properties can be set on an already-indexed document.  To give an 
example: Beagle adds Nautilus metadata to files by parsing the XML files 
while the file system is crawled.  It would be much cleaner (and 
extensible) to have a separate backend which itself crawled Nautilus 
metadata and added properties to already-indexed files in the database.


The mailbox was mainly meant as an example, but let's stick to it 
anyway. What I meant to say was that you can index the mailbox at any 
time, but the is no need to keep a watch out for changes since the only 
app that is ever going to change it is (say) Evolution. In a metadata 
aware desktop Evolution would be emitting signals (or whatever) when the 
mailbox changes (perhaps with hints as to what changed).


Sure, this is the ideal situation, but it means that you need to have 
internal crawling code and an external handler to deal with changes as 
they come in.  The bulk of the code (and the hard part, really) is 
usually the crawler.


Joe
___
xdg mailing list
xdg@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/xdg


Re: [Wasabi] Kicking of the Metadata spec - brainstorm

2007-02-21 Thread jamie
On Wed, 2007-02-21 at 08:54 -0500, Joe Shaw wrote:
 Hi,
 
 jamie wrote:
  (3) is the safest and best way - you get a richer service that allows
  all apps to share metadata and stay up to date.
  
  (2) is great in theory but falls over in practice for loads of reasons
  (mainly need for fallbacks). XMP sidecar is better and really just needs
  GVFS and KIO slaves to support it transparently (IE to auto copy the
  hidden sidecar file whenever you copy/move a file)
  
  (1) fails  for same reasons (2) does - lack of compatibility means
  having to have fallbacks in place for file types that dont have embedded
  metadata. (if that fallback is going to be XMP sidecar then why bother
  with anything else?)
 
 Sorry, I must not have been clear in my previous email.  The entire 
 point is to have fallbacks, because it gives you the richest experience. 
   You try (1) [inline metdata] and if it fails or doesn't make sense, 
 you try (2) [extended attributes].  Only then if it fails do you do (3).

for an app that wants to fetch metadata it seems to be a lot of hoops to
jump through.

What would be fine is perhaps (1) + (3) and (2) + (3) so that an app can
easily get the metadata regardless of where its stored (via (3))

though I prefer XMP sidecar + (3) with (3) backing up all user defined
metadata there that way an app has a single gateway to all metadata.

 
 XMP sidecars are a possibility (probably as step 2.5), but they have the 
 same problem as a centralized database: they don't follow the file 
 around.  If you copy a file, rename a file, or send it as an email 
 attachment, if you don't know to do the same with the sidecar you'll 
 lose it.  gnome-vfs/gvfs/kio support for this helps the issue, but it's 
 not as tightly coupled as xattrs.

I find it the other way round : xattrs fails as soon as you try and copy
them off the volume. They simply dont work on the network, in
attachments, to floppy/CD or USB stick etc. XMP sidecar seems to be the
best choice here (if we have vfs support for them)

 
 And you say that (1) and (2) fail in practice, but this just isn't true. 
   F-Spot stores its tags as XMP data in image files quite successfully. 
   Rhythmbox, sound-juicer, and Banshee have been putting id3 tags in 
 MP3s forever.  As for xattrs, we've been using them very successfully in 
 Beagle for over 3 years (with a transparent fallback to an sqlite 
 database when it's not possible).  So all of this is possible to do and 
 give the user a good experience.

yes with certain file types - my point is its not a *universal*
solution.

 
   Also there is notifications which you need a daemon for - its why
   gconf is a daemon!
 
 Notifications are tricky, but I think it might be possible to do them 
 within a library using inotify (or FAM or other appropriate method). 
 gconf is a daemon because (a) mechanisms like these didn't exist and (b) 
 because absolutely everything touches it, and it doesn't make sense to 
 continually reparse its XML files.

not really a notification should pass the new value for the specific
metadata in the signal so apps dont need to work out which metadata has
changed or force them to refresh everything. I dont think thats possible
without a daemon? (you would need a copy of the old metadata to work out
whats changed)

(KConfig has failed to implement notifications for exactly that reason)

 
 With approaches (1) and (2), notification is trivial.  You get 
 WRITE_CLOSE event and ATTR_CHANGED events on the files themselves.  For 
 (3), you can watch the database file for changes.  (This means you have 
 to timestamp rows, but once that's done you can trivially SELECT for 
 changes newer than a certain time.  This is a good thing to do in any case.)

(3) would potentially be an enormous db - it wont be practical to do
that and as above places too much burden on the apps (again you need the
old metadata to work out whats changed). Its kind of going to extreme
lengths (and pain) to avoid having a daemon to me but then thats just my
opinion!

the other reason for a daemon is its safer on broken nfs if only one
process ever writes to the db at a time (that of course wont work if a
user has multiple sessions with a single shared nfs home directory
between them but people who do that on broken nfs are asking for
trouble!).

jamie


___
xdg mailing list
xdg@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/xdg


Re: [Wasabi] Kicking of the Metadata spec - brainstorm

2007-02-21 Thread jamie

  
  With approaches (1) and (2), notification is trivial.  You get 
  WRITE_CLOSE event and ATTR_CHANGED events on the files themselves.  For 
  (3), you can watch the database file for changes.  (This means you have 
  to timestamp rows, but once that's done you can trivially SELECT for 
  changes newer than a certain time.  This is a good thing to do in any case.)
 
 (3) would potentially be an enormous db - it wont be practical to do
 that and as above places too much burden on the apps (again you need the
 old metadata to work out whats changed). Its kind of going to extreme
 lengths (and pain) to avoid having a daemon to me but then thats just my
 opinion!

let me give you a good example of why this is suboptimal

suppose :

1) nautilus is listening for all metadata changes in files in folder X
only

2) F-spot is listening for metadata changes to photo files only

3) RB is is listening for metadata changes to music files only

lets suppose (2) updates a tag, (1) and (2) would have to re-query the
DB using timestamps and get back a tag. They would both have to
determine if this tag is relevant to them and then hence do all the
work. Also those apps would be continually awoken whenever anything
changes (bad for batteries). This clearly is not a scalable solution
when you start having more than a few apps.

now lets look at above with a daemon:

(1) would use a subscriber interface that only gets notified for
metadata changes to files in folder X


(2) and (3) do likewise but for photos and music

this way, only the apps get awoken when necessary and they get the
metadata change they need (without out any work involved). Its efficient
and scalable too.




___
xdg mailing list
xdg@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/xdg


Re: [Wasabi] Kicking of the Metadata spec - brainstorm

2007-02-21 Thread Joe Shaw
Hi,

On Wed, 2007-02-21 at 14:26 +, jamie wrote:
  Sorry, I must not have been clear in my previous email.  The entire 
  point is to have fallbacks, because it gives you the richest experience. 
You try (1) [inline metdata] and if it fails or doesn't make sense, 
  you try (2) [extended attributes].  Only then if it fails do you do (3).
 
 for an app that wants to fetch metadata it seems to be a lot of hoops to
 jump through.
 
 What would be fine is perhaps (1) + (3) and (2) + (3) so that an app can
 easily get the metadata regardless of where its stored (via (3))

You hide it all behind a single API; the consumer doesn't care where it
comes from, that's the library's job.

 I find it the other way round : xattrs fails as soon as you try and copy
 them off the volume. They simply dont work on the network, in
 attachments, to floppy/CD or USB stick etc. XMP sidecar seems to be the
 best choice here (if we have vfs support for them)

There's no reason why we couldn't do both.  xattrs on volumes that
support them, then transition to XMP sidecars if the VFS notices we're
trying to copy/move them to a volume that doesn't.

  And you say that (1) and (2) fail in practice, but this just isn't true. 
F-Spot stores its tags as XMP data in image files quite successfully. 
Rhythmbox, sound-juicer, and Banshee have been putting id3 tags in 
  MP3s forever.  As for xattrs, we've been using them very successfully in 
  Beagle for over 3 years (with a transparent fallback to an sqlite 
  database when it's not possible).  So all of this is possible to do and 
  give the user a good experience.
 
 yes with certain file types - my point is its not a *universal*
 solution.

There is no universal solution.  Storing things in a database breaks
down when I want to fix a typo in the artist tag of an mp3 I downloaded.

 not really a notification should pass the new value for the specific
 metadata in the signal so apps dont need to work out which metadata has
 changed or force them to refresh everything. I dont think thats possible
 without a daemon? (you would need a copy of the old metadata to work out
 whats changed)

Passing the new value is easy, because that's what's now in the
file/database.  Reparsing the file might be a pain for some file
formats, so I can see how a daemon would improve the caching there.

 (3) would potentially be an enormous db - it wont be practical to do
 that and as above places too much burden on the apps (again you need the
 old metadata to work out whats changed). Its kind of going to extreme
 lengths (and pain) to avoid having a daemon to me but then thats just my
 opinion!

I don't think the size of the DB will affect things too much.  If you
add an index on the changed-time column, updates will be infrequent
enough that the overhead at insert time will be small, and the overhead
at search time will be nonexistent.

I admit I haven't prototyped it though, so it might end up being
untenable and a daemon unavoidable.

 the other reason for a daemon is its safer on broken nfs if only one
 process ever writes to the db at a time

Yep, that's a good point.

Joe

___
xdg mailing list
xdg@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/xdg


Re: [Wasabi] Kicking of the Metadata spec - brainstorm

2007-02-21 Thread Mikkel Kamstrup Erlandsen

2007/2/21, Fabrice Colin [EMAIL PROTECTED]:


On 2/21/07, Jos van den Oever [EMAIL PROTECTED] wrote:
 2007/2/20, Joe Shaw [EMAIL PROTECTED]:
  Heh.  Fair enough.  Should we keep it simply to the search-side of
  things, then?

 The intention is to define a way to define metadata fields. This is
 required for searching and useful for metadata read/write too. So lets
 come up with the properties we allow metadata fields to have. I think
 what we have so far is what is required for searching too. So this
 will not be about an API for reading and writing.



Speaking of searching metadata, would it be useful to also define

which metadata fields are to be treated as boolean filters and which allow
probabilistic search ?



Will this not be implicit in most cases? Do you have an example?

Cheers,
Mikkel
___
xdg mailing list
xdg@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/xdg