Re: [Tracker] Guessing metadata and retrieval from external resources
Hi, On Tue, Oct 11, 2011 at 6:10 PM, Age Bosma agebo...@gmail.com wrote: On 10-10-11 10:44, Ivan Frade wrote: On Mon, Oct 10, 2011 at 1:33 AM, Age Bosma agebo...@gmail.com wrote: What about setting an attribute for a property? Is that an option? I.e. just one title property, set with either info from a file or an external resource, with a way to determine where it came from afterwards. In RDF you cannot put attributes in properties. You could do it creating a new subclass or xsd:string with the attributes you consider convenient (E.G. a boolean guessed). Probably this would require deeper changes into Tracker, because xsd:string is a basic datatype... Note that probably these new miners would need some UI (to ask the user what movie from a list is the one in their filesystem). This can be tricky (no miner has specific UI so far). I'd rather not go there. It would be very confusing for a user. We would be better of just doing all in the background. Accuracy can be increased by using multiple resources for the same thing. It is even more confusing when you find movies you didn't even know they exist in your movie library :) XBMC (my reference on this topic) does a brilliant work guessing the movies and still it needs manual input. Multiple resources won't help to classify a resource called hc-p01.avi. Regards, Ivan ___ tracker-list mailing list tracker-list@gnome.org http://mail.gnome.org/mailman/listinfo/tracker-list
Re: [Tracker] Guessing metadata and retrieval from external resources
Hi On Wed, Oct 12, 2011 at 9:42 AM, Ivan Frade ivan.fr...@gmail.com wrote: On Tue, Oct 11, 2011 at 6:10 PM, Age Bosma agebo...@gmail.com wrote: What about setting an attribute for a property? Is that an option? I.e. just one title property, set with either info from a file or an external resource, with a way to determine where it came from afterwards. In RDF you cannot put attributes in properties. You could do it creating a new subclass or xsd:string with the attributes you consider convenient (E.G. a boolean guessed). Probably this would require deeper changes into Tracker, because xsd:string is a basic datatype... Don't graphs give us that ability already? Data mined from the FS goes in the fs miner's graph, data from online metadata services can go in a separate graph, data entered by user-driven applications goes in another ... Note that probably these new miners would need some UI (to ask the user what movie from a list is the one in their filesystem). This can be tricky (no miner has specific UI so far). I'd rather not go there. It would be very confusing for a user. We would be better of just doing all in the background. Accuracy can be increased by using multiple resources for the same thing. It is even more confusing when you find movies you didn't even know they exist in your movie library :) XBMC (my reference on this topic) does a brilliant work guessing the movies and still it needs manual input. Multiple resources won't help to classify a resource called hc-p01.avi. I agree that it's impossible to guarantee metadata, but I think the correct solution is to allow the user to correct the data when it's being viewed in applications rather than to pop up a dialog during extraction. Of course, this is hard :) Sam ___ tracker-list mailing list tracker-list@gnome.org http://mail.gnome.org/mailman/listinfo/tracker-list
Re: [Tracker] Guessing metadata and retrieval from external resources
On 12.10.2011 12:11, Sam Thursfield wrote: Hi On Wed, Oct 12, 2011 at 9:42 AM, Ivan Frade ivan.fr...@gmail.com wrote: On Tue, Oct 11, 2011 at 6:10 PM, Age Bosma agebo...@gmail.com wrote: What about setting an attribute for a property? Is that an option? I.e. just one title property, set with either info from a file or an external resource, with a way to determine where it came from afterwards. In RDF you cannot put attributes in properties. You could do it creating a new subclass or xsd:string with the attributes you consider convenient (E.G. a boolean guessed). Probably this would require deeper changes into Tracker, because xsd:string is a basic datatype... Don't graphs give us that ability already? Data mined from the FS goes in the fs miner's graph, data from online metadata services can go in a separate graph, data entered by user-driven applications goes in another ... That works for multi valued properties, not single valued ones. A statement can only be in one graph in Tracker, so you can't for example have two nie:title (even in different graphs) for a same subject. Note that probably these new miners would need some UI (to ask the user what movie from a list is the one in their filesystem). This can be tricky (no miner has specific UI so far). I'd rather not go there. It would be very confusing for a user. We would be better of just doing all in the background. Accuracy can be increased by using multiple resources for the same thing. It is even more confusing when you find movies you didn't even know they exist in your movie library :) XBMC (my reference on this topic) does a brilliant work guessing the movies and still it needs manual input. Multiple resources won't help to classify a resource called hc-p01.avi. I agree that it's impossible to guarantee metadata, but I think the correct solution is to allow the user to correct the data when it's being viewed in applications rather than to pop up a dialog during extraction. Of course, this is hard :) Sam ___ tracker-list mailing list tracker-list@gnome.org http://mail.gnome.org/mailman/listinfo/tracker-list ___ tracker-list mailing list tracker-list@gnome.org http://mail.gnome.org/mailman/listinfo/tracker-list
Re: [Tracker] Guessing metadata and retrieval from external resources
On 10-10-11 10:44, Ivan Frade wrote: On Mon, Oct 10, 2011 at 1:33 AM, Age Bosma agebo...@gmail.com wrote: It is then up to an application to decide what to use. I.e. normal title present? Use it. No title present but an external title present? Use that one if you like. That is nice in theory, but in practice means the application must do multiple queries just for the title. Also, the application needs to know how many different sources of information are available. I would say that those scrapping miners should override the values of the properties they know. In some cases we could add new properties to the ontologies and the application could use tracker:coalesce in the query. First of all, please don't call it a scraper ;-) Web scraping should really be prevented. Ideally only web services should be used to obtain additional data. Websites change relatively often, as Daniel O'Connor pointed out in the blog post he referred to, compared to web services. Having to alter a website parser for each website change will become an endless task. Then there's also the rights issue. While it will be hard to detect, big resource websites do not allow you to do so. Using web services will prove to be more stable in the end. I agree that multiple queries for one piece of info should be prevented. From an application point of view you just want the title, no matter where it came from. Yet I do not feel completely comfortable with overwriting a property without being able to determine its source (from file or elsewhere). The info from an external source could have been determined wrong and can imagine an app to want to indicate this to a user somehow. Having multiple title properties is no option as discussed on IRC. There is no way for Tracker to automatically fall back on an external property and you don't want to start using tracker:coalesce in an app for each possible property. What about setting an attribute for a property? Is that an option? I.e. just one title property, set with either info from a file or an external resource, with a way to determine where it came from afterwards. Note that probably these new miners would need some UI (to ask the user what movie from a list is the one in their filesystem). This can be tricky (no miner has specific UI so far). I'd rather not go there. It would be very confusing for a user. You create or put a file somewhere and all of a sudden a UI pops up out of nowhere asking you what you've been doing. I can see a lot of people getting paranoid (even more) ;-) It can also become quite an overkill, especially at initial indexing. We would be better of just doing all in the background. Accuracy can be increased by using multiple resources for the same thing. Yours, Age (Forage) signature.asc Description: OpenPGP digital signature ___ tracker-list mailing list tracker-list@gnome.org http://mail.gnome.org/mailman/listinfo/tracker-list
Re: [Tracker] Guessing metadata and retrieval from external resources
On 10-10-11 12:01, Martyn Russell wrote: On 09/10/11 23:33, Age Bosma wrote: Why would one want this? Often more info than can be extracted from files is appreciated. It will prevent applications from having to reinvent the wheel, deviating from Tracker as their meta-data source because it does not have the information. E.g. Rygel could start listing movies on a TV with the actual movie title instead of using file names or list them by director even though no tags where present. Banshee (if/when they start using Tracker) does not have to maintain their own MusicBrainz query service because Tracker already provides the information. There are a number of issues here. What springs to mind is: - Do we write back the data to the file itself (I would like to see that, but support there is limited right now by file type)? Personally I don't think we should go there: - Tracker is the metadata resource apps will use, no need to include it in the files again. - Apps, including Tracker, should not touch original files, unless specifically requested to do so. It would introduce unexpected/unwanted behaviour otherwise. - And there's of course the difficulty of actually being able to store the info in a specific container, as Jens Georg pointed out. - Guessing metadata based on filename, etc is currently build time optional. Part of me wonders if this should be in the tracker-preferences dialog somewhere so users can configure this more dynamically. Part of me thinks it's not useful though. Perhaps a silent configuration not in the UI is more preferred. It is? How can I enable it and where is it located in the source? Does tracker allow extending functionality as described above? Yes and no. You could write a miner as suggested, but I feel this is not the right approach. While the name miner makes sense, what we're doing here is more post-processing and we've considered having some daemon to go around cleaning up classes and information which can be derived from content inserted by miner-fs or applications. A couple of examples here are: Has an attempt been made as well after the consideration? Is there an alternative currently available? - You insert a contact for an email, you delete the email, the contact then stay around. Really shouldn't the contact be removed? It does depend on who uses it (the graph) but if it is just there for the email, it should be removed ideally. If some gnome-contacts or other application makes use of it using their graph to insert the data, we wouldn't clean it up. What is happening now? I guess you could write a miner to do this. It would listen to graph update signals to know when to find out about new music/videos and update the store. You could also write this into tracker-extract/libtracker-extract and have some common functions to get this information. Both a miner and extractor are not quite meant for post-processing as you've mentioned. What advantage or disadvantage does an extractor have over a miner? Are there more reasons why a minor would not be the right approach besides its name? Yours, Age (Forage) signature.asc Description: OpenPGP digital signature ___ tracker-list mailing list tracker-list@gnome.org http://mail.gnome.org/mailman/listinfo/tracker-list
Re: [Tracker] Guessing metadata and retrieval from external resources
Hi, On Mon, Oct 10, 2011 at 1:33 AM, Age Bosma agebo...@gmail.com wrote: Hi, Would it be an idea to extend this concept by supplementing the metadata which could not be determined from a file with info from external resources? And/or intelligent guessing for that matter? Definitely! We had that idea in mind when we designed Tracker. You just need to implement a miner to calculate or to bring from somewhere else that information. The external and guessed meta-data should be stored separate from the normal meta-data stored by Tracker, marked as an external title/director/... tag. We have a limited support for graphs into Tracker that can be used for that. We use it already to know when certain triplet came from the file or from an application. It is then up to an application to decide what to use. I.e. normal title present? Use it. No title present but an external title present? Use that one if you like. That is nice in theory, but in practice means the application must do multiple queries just for the title. Also, the application needs to know how many different sources of information are available. I would say that those scrapping miners should override the values of the properties they know. In some cases we could add new properties to the ontologies and the application could use tracker:coalesce in the query. Does functionality as described above fit within the goals/scope of tracker? Would there be any objections again going into this direction? Does tracker allow extending functionality as described above? It fits very nicely with our vision of Tracker. You just need to implement a miner (check libtracker-miner) and use the graphs wisely to put the information into Tracker. Of course it is the first time anybody writes a scrapping miner, so maybe we need to fix details here and there. Business as usual. Note that probably these new miners would need some UI (to ask the user what movie from a list is the one in their filesystem). This can be tricky (no miner has specific UI so far). Does the current shared-filemetadata-spec provide a way to store information as external/additional? That spec is completely out-of-date. The model we are actually using is here: http://developer.gnome.org/ontology/unstable/ The documentation about libtracker-miner is here: http://developer.gnome.org/libtracker-miner/unstable/ And you will need to use also libtracker-sparql: http://developer.gnome.org/libtracker-sparql/unstable/ If you need a more interactive help, feel free to come to IRC (#tracker in GIMPnet) and ask! Regards, Ivan ___ tracker-list mailing list tracker-list@gnome.org http://mail.gnome.org/mailman/listinfo/tracker-list
Re: [Tracker] Guessing metadata and retrieval from external resources
On Mon, Oct 10, 2011 at 9:03 AM, Age Bosma agebo...@gmail.com wrote: Hi, As far as I understand it is that Tracker currently only sticks to collecting meta-data which can be retrieved from the actual files. Would it be an idea to extend this concept by supplementing the metadata which could not be determined from a file with info from external resources? And/or intelligent guessing for that matter? E.g. we have a movie with no tags like title, director, etc. We do have a file name though. In a lot of cases the movie title can be subtracted from it. This could be added to the tracker metadata, followed by requesting the director of a movie from an external resource like IMDB. The same would go for the language of a file like a movie or subtitle, where a language code is included in a file name. A different approach can be taken with music. A audio fingerprint can be determined, followed by using that to retrieve the additional meta-data from MusicBrainz. The external and guessed meta-data should be stored separate from the normal meta-data stored by Tracker, marked as an external title/director/... tag. It is then up to an application to decide what to use. I.e. normal title present? Use it. No title present but an external title present? Use that one if you like. Why would one want this? Often more info than can be extracted from files is appreciated. It will prevent applications from having to reinvent the wheel, deviating from Tracker as their meta-data source because it does not have the information. I did some tinkering to basically interact with XMBC web services and push metadata into tracker along these lines: http://clockwerx.blogspot.com/2011/01/xbmc-vs-boxee-vs-file-browsers.html I don't think I bothered publishing the code; as it was just a shell script to explore an idea. Unfortunately; there were few areas in gnome at the time which could use the additional data; so there wasn't much utility in it at the time. ___ tracker-list mailing list tracker-list@gnome.org http://mail.gnome.org/mailman/listinfo/tracker-list