Re: [Tracker] Guessing metadata and retrieval from external resources

2011-10-12 Thread Ivan Frade
Hi,

On Tue, Oct 11, 2011 at 6:10 PM, Age Bosma agebo...@gmail.com wrote:
 On 10-10-11 10:44, Ivan Frade wrote:
 On Mon, Oct 10, 2011 at 1:33 AM, Age Bosma agebo...@gmail.com wrote:

 What about setting an attribute for a property? Is that an option?
 I.e. just one title property, set with either info from a file or an
 external resource, with a way to determine where it came from afterwards.

In RDF you cannot put attributes in properties. You could do it
creating a new subclass or xsd:string with the attributes you consider
convenient (E.G. a boolean guessed). Probably this would require
deeper changes into Tracker, because xsd:string is a basic datatype...

  Note that probably these new miners would need some UI (to ask the
 user what movie from a list is the one in their filesystem). This can
 be tricky (no miner has specific UI so far).


 I'd rather not go there. It would be very confusing for a user.
 We would be better of just doing all in the background. Accuracy can be
 increased by using multiple resources for the same thing.

 It is even more confusing when you find movies you didn't even know
they exist in your movie library :) XBMC (my reference on this topic)
does a brilliant work guessing the movies and still it needs manual
input. Multiple resources won't help to classify a resource called
hc-p01.avi.

 Regards,

Ivan
___
tracker-list mailing list
tracker-list@gnome.org
http://mail.gnome.org/mailman/listinfo/tracker-list


Re: [Tracker] Guessing metadata and retrieval from external resources

2011-10-12 Thread Sam Thursfield
Hi

On Wed, Oct 12, 2011 at 9:42 AM, Ivan Frade ivan.fr...@gmail.com wrote:
 On Tue, Oct 11, 2011 at 6:10 PM, Age Bosma agebo...@gmail.com wrote:
 What about setting an attribute for a property? Is that an option?
 I.e. just one title property, set with either info from a file or an
 external resource, with a way to determine where it came from afterwards.

 In RDF you cannot put attributes in properties. You could do it
 creating a new subclass or xsd:string with the attributes you consider
 convenient (E.G. a boolean guessed). Probably this would require
 deeper changes into Tracker, because xsd:string is a basic datatype...

Don't graphs give us that ability already? Data mined from the FS goes
in the fs miner's graph, data from online metadata services can go in
a separate graph, data entered by user-driven applications goes in
another ...

  Note that probably these new miners would need some UI (to ask the
 user what movie from a list is the one in their filesystem). This can
 be tricky (no miner has specific UI so far).


 I'd rather not go there. It would be very confusing for a user.
 We would be better of just doing all in the background. Accuracy can be
 increased by using multiple resources for the same thing.

  It is even more confusing when you find movies you didn't even know
 they exist in your movie library :) XBMC (my reference on this topic)
 does a brilliant work guessing the movies and still it needs manual
 input. Multiple resources won't help to classify a resource called
 hc-p01.avi.

I agree that it's impossible to guarantee metadata, but I think the
correct solution is to allow the user to correct the data when it's
being viewed in applications rather than to pop up a dialog during
extraction. Of course, this is hard :)

Sam
___
tracker-list mailing list
tracker-list@gnome.org
http://mail.gnome.org/mailman/listinfo/tracker-list


Re: [Tracker] Guessing metadata and retrieval from external resources

2011-10-12 Thread Adrien Bustany

On 12.10.2011 12:11, Sam Thursfield wrote:

Hi

On Wed, Oct 12, 2011 at 9:42 AM, Ivan Frade ivan.fr...@gmail.com 
wrote:
On Tue, Oct 11, 2011 at 6:10 PM, Age Bosma agebo...@gmail.com 
wrote:

What about setting an attribute for a property? Is that an option?
I.e. just one title property, set with either info from a file or 
an
external resource, with a way to determine where it came from 
afterwards.


In RDF you cannot put attributes in properties. You could do it
creating a new subclass or xsd:string with the attributes you 
consider

convenient (E.G. a boolean guessed). Probably this would require
deeper changes into Tracker, because xsd:string is a basic 
datatype...


Don't graphs give us that ability already? Data mined from the FS 
goes

in the fs miner's graph, data from online metadata services can go in
a separate graph, data entered by user-driven applications goes in
another ...


That works for multi valued properties, not single valued ones. A 
statement

can only be in one graph in Tracker, so you can't for example have two
nie:title (even in different graphs) for a same subject.



 Note that probably these new miners would need some UI (to ask 
the
user what movie from a list is the one in their filesystem). This 
can

be tricky (no miner has specific UI so far).



I'd rather not go there. It would be very confusing for a user.
We would be better of just doing all in the background. Accuracy 
can be

increased by using multiple resources for the same thing.


 It is even more confusing when you find movies you didn't even know
they exist in your movie library :) XBMC (my reference on this 
topic)

does a brilliant work guessing the movies and still it needs manual
input. Multiple resources won't help to classify a resource called
hc-p01.avi.


I agree that it's impossible to guarantee metadata, but I think the
correct solution is to allow the user to correct the data when it's
being viewed in applications rather than to pop up a dialog during
extraction. Of course, this is hard :)

Sam
___
tracker-list mailing list
tracker-list@gnome.org
http://mail.gnome.org/mailman/listinfo/tracker-list


___
tracker-list mailing list
tracker-list@gnome.org
http://mail.gnome.org/mailman/listinfo/tracker-list


Re: [Tracker] Guessing metadata and retrieval from external resources

2011-10-11 Thread Age Bosma
On 10-10-11 10:44, Ivan Frade wrote:
 On Mon, Oct 10, 2011 at 1:33 AM, Age Bosma agebo...@gmail.com wrote:
 It is then up to an application to decide what to use. I.e. normal title
 present? Use it. No title present but an external title present? Use
 that one if you like.
 
 That is nice in theory, but in practice means the application must do
 multiple queries just for the title. Also, the application needs to
 know how many different sources of information are available.
 
  I would say that those scrapping miners should override the values
 of the properties they know. In some cases we could add new properties
 to the ontologies and the application could use tracker:coalesce in
 the query.
 

First of all, please don't call it a scraper ;-) Web scraping should
really be prevented.
Ideally only web services should be used to obtain additional data.
Websites change relatively often, as Daniel O'Connor pointed out in the
blog post he referred to, compared to web services. Having to alter a
website parser for each website change will become an endless task.
Then there's also the rights issue. While it will be hard to detect, big
resource websites do not allow you to do so.
Using web services will prove to be more stable in the end.

I agree that multiple queries for one piece of info should be prevented.
From an application point of view you just want the title, no matter
where it came from.
Yet I do not feel completely comfortable with overwriting a property
without being able to determine its source (from file or elsewhere). The
info from an external source could have been determined wrong and can
imagine an app to want to indicate this to a user somehow.

Having multiple title properties is no option as discussed on IRC. There
is no way for Tracker to automatically fall back on an external property
and you don't want to start using tracker:coalesce in an app for each
possible property.
What about setting an attribute for a property? Is that an option?
I.e. just one title property, set with either info from a file or an
external resource, with a way to determine where it came from afterwards.

  Note that probably these new miners would need some UI (to ask the
 user what movie from a list is the one in their filesystem). This can
 be tricky (no miner has specific UI so far).
 

I'd rather not go there. It would be very confusing for a user. You
create or put a file somewhere and all of a sudden a UI pops up out of
nowhere asking you what you've been doing. I can see a lot of people
getting paranoid (even more) ;-)
It can also become quite an overkill, especially at initial indexing.
We would be better of just doing all in the background. Accuracy can be
increased by using multiple resources for the same thing.

Yours,

Age (Forage)



signature.asc
Description: OpenPGP digital signature
___
tracker-list mailing list
tracker-list@gnome.org
http://mail.gnome.org/mailman/listinfo/tracker-list


Re: [Tracker] Guessing metadata and retrieval from external resources

2011-10-11 Thread Age Bosma
On 10-10-11 12:01, Martyn Russell wrote:
 On 09/10/11 23:33, Age Bosma wrote:

 Why would one want this? Often more info than can be extracted from
 files is appreciated. It will prevent applications from having to
 reinvent the wheel, deviating from Tracker as their meta-data source
 because it does not have the information.
 E.g. Rygel could start listing movies on a TV with the actual movie
 title instead of using file names or list them by director even though
 no tags where present. Banshee (if/when they start using Tracker) does
 not have to maintain their own MusicBrainz query service because Tracker
 already provides the information.
 
 There are a number of issues here. What springs to mind is:
 
  - Do we write back the data to the file itself (I would like to see
 that, but support there is limited right now by file type)?
 

Personally I don't think we should go there:
- Tracker is the metadata resource apps will use, no need to include it
in the files again.
- Apps, including Tracker, should not touch original files, unless
specifically requested to do so. It would introduce unexpected/unwanted
behaviour otherwise.
- And there's of course the difficulty of actually being able to store
the info in a specific container, as Jens Georg pointed out.

  - Guessing metadata based on filename, etc is currently build time
 optional. Part of me wonders if this should be in the
 tracker-preferences dialog somewhere so users can configure this more
 dynamically. Part of me thinks it's not useful though. Perhaps a silent
 configuration not in the UI is more preferred.
 

It is? How can I enable it and where is it located in the source?

 
 Does tracker allow extending functionality as described above?
 
 Yes and no. You could write a miner as suggested, but I feel this is not
 the right approach. While the name miner makes sense, what we're doing
 here is more post-processing and we've considered having some daemon
 to go around cleaning up classes and information which can be derived
 from content inserted by miner-fs or applications. A couple of examples
 here are:
 

Has an attempt been made as well after the consideration? Is there an
alternative currently available?

  - You insert a contact for an email, you delete the email, the contact
 then stay around. Really shouldn't the contact be removed? It does
 depend on who uses it (the graph) but if it is just there for the email,
 it should be removed ideally. If some gnome-contacts or other
 application makes use of it using their graph to insert the data, we
 wouldn't clean it up.
 

What is happening now?

 
 I guess you could write a miner to do this. It would listen to graph
 update signals to know when to find out about new music/videos and
 update the store.
 
 You could also write this into tracker-extract/libtracker-extract and
 have some common functions to get this information. 

Both a miner and extractor are not quite meant for post-processing as
you've mentioned.

What advantage or disadvantage does an extractor have over a miner?
Are there more reasons why a minor would not be the right approach
besides its name?

Yours,

Age (Forage)



signature.asc
Description: OpenPGP digital signature
___
tracker-list mailing list
tracker-list@gnome.org
http://mail.gnome.org/mailman/listinfo/tracker-list


Re: [Tracker] Guessing metadata and retrieval from external resources

2011-10-10 Thread Ivan Frade
Hi,

On Mon, Oct 10, 2011 at 1:33 AM, Age Bosma agebo...@gmail.com wrote:
 Hi,

 Would it be an idea to extend this concept by supplementing the metadata
 which could not be determined from a file with info from external
 resources? And/or intelligent guessing for that matter?

Definitely! We had that idea in mind when we designed Tracker. You
just need to implement a miner to calculate or to bring from somewhere
else that information.

 The external and guessed meta-data should be stored separate from the
 normal meta-data stored by Tracker, marked as an external
 title/director/... tag.

We have a limited support for graphs into Tracker that can be used for
that. We use it already to know when certain triplet came from the
file or from an application.

 It is then up to an application to decide what to use. I.e. normal title
 present? Use it. No title present but an external title present? Use
 that one if you like.

That is nice in theory, but in practice means the application must do
multiple queries just for the title. Also, the application needs to
know how many different sources of information are available.

 I would say that those scrapping miners should override the values
of the properties they know. In some cases we could add new properties
to the ontologies and the application could use tracker:coalesce in
the query.

 Does functionality as described above fit within the goals/scope of tracker?
 Would there be any objections again going into this direction?
 Does tracker allow extending functionality as described above?

 It fits very nicely with our vision of Tracker. You just need to
implement a miner (check libtracker-miner) and use the graphs wisely
to put the information into Tracker. Of course it is the first time
anybody writes a scrapping miner, so maybe we need to fix details
here and there. Business as usual.

 Note that probably these new miners would need some UI (to ask the
user what movie from a list is the one in their filesystem). This can
be tricky (no miner has specific UI so far).

 Does the current shared-filemetadata-spec provide a way to store
 information as external/additional?

 That spec is completely out-of-date. The model we are actually using is here:
http://developer.gnome.org/ontology/unstable/

 The documentation about libtracker-miner is here:
http://developer.gnome.org/libtracker-miner/unstable/

 And you will need to use also libtracker-sparql:
http://developer.gnome.org/libtracker-sparql/unstable/

 If you need a more interactive help, feel free to come to IRC
(#tracker in GIMPnet) and ask!

 Regards,

Ivan
___
tracker-list mailing list
tracker-list@gnome.org
http://mail.gnome.org/mailman/listinfo/tracker-list


Re: [Tracker] Guessing metadata and retrieval from external resources

2011-10-10 Thread Daniel O'Connor
On Mon, Oct 10, 2011 at 9:03 AM, Age Bosma agebo...@gmail.com wrote:

 Hi,

 As far as I understand it is that Tracker currently only sticks to
 collecting meta-data which can be retrieved from the actual files.

 Would it be an idea to extend this concept by supplementing the metadata
 which could not be determined from a file with info from external
 resources? And/or intelligent guessing for that matter?

 E.g. we have a movie with no tags like title, director, etc. We do have
 a file name though.
 In a lot of cases the movie title can be subtracted from it. This could
 be added to the tracker metadata, followed by requesting the director of
 a movie from an external resource like IMDB.
 The same would go for the language of a file like a movie or subtitle,
 where a language code is included in a file name.
 A different approach can be taken with music. A audio fingerprint can be
 determined, followed by using that to retrieve the additional meta-data
 from MusicBrainz.

 The external and guessed meta-data should be stored separate from the
 normal meta-data stored by Tracker, marked as an external
 title/director/... tag.
 It is then up to an application to decide what to use. I.e. normal title
 present? Use it. No title present but an external title present? Use
 that one if you like.

 Why would one want this? Often more info than can be extracted from
 files is appreciated. It will prevent applications from having to
 reinvent the wheel, deviating from Tracker as their meta-data source
 because it does not have the information.


I did some tinkering to basically interact with XMBC web services and push
metadata into tracker along these lines:
http://clockwerx.blogspot.com/2011/01/xbmc-vs-boxee-vs-file-browsers.html

I don't think I bothered publishing the code; as it was just a shell script
to explore an idea.
Unfortunately; there were few areas in gnome at the time which could use the
additional data; so there wasn't much utility in it at the time.
___
tracker-list mailing list
tracker-list@gnome.org
http://mail.gnome.org/mailman/listinfo/tracker-list