Thanks for the response, links and the info Ugo. Theres some good code scattered around there and quite a few sensible strategies for correctly identifying content. I think my code is hovering in the region of 90-95% of correctly identified content and if I incorporated the other strategies/heuristics that should hit 95-99%, so I'm quite happy to have a shot at integrating that over the next week or two. If I can do that and create a good set of test cases and expanded tests I'll submit the code. That said, my free time is sadly very erratic (hence why I'm replying to this post nearly 2 weeks after receiving it) so I never really know when I'll get time to work on these sort of projects.
In reference to an expanded media scanner - I wish I had the time, although I would question if it's an area where that level of effort would be best spent. Theres some basic strategies that can be used to identify the vast majority of content and from there you are talking about ever increasing efforts for minimal improvements. Indeed looking at where I currently have content mismatches or holes they mainly come down to : - cases where the naming of the content is so poor no automated parsing would ever work without sophisticated AI actually watching the content. - case where the content is not in thetvdb or themoviedb. - cases where searching either of those sources returns invalid results. - cases where the underlying database structure cannot handle the "file format" e.g. Multiple VOBs or CD1,CD2. It is actually the latter case which is far more serious than any of the others since I have many items appearing in the menus multiple times due to being spread over multiple CDs/DVDs. Is there something in the works to allow better "grouping" of this sort of content? Anyways, touch wood I'll have a few more days to look at this soon; I had a look at the Fluendo agreement though and I must admit I have some reservations about signing. While its great that the project is open source it is quite offputting to say the least.... Lee -----Original Message----- From: [email protected] [mailto:[email protected]] On Behalf Of Ugo Riboni Sent: 27 July 2009 10:52 To: Moovida developers list Subject: Re: A few questions > The above notwithstanding, I've spent a few days hacking the > video_parser.py in-order to get my library recognised. It's an > assortment of anime, TV and movies coming from various sources and > **probably** typical of most users content. I've gone from the 1.05 > version having 10-20% of the content being recognised and added to the > movie/tv show libraries (with 80% uncategorised), to @90% being > categorised with most of the content that is being problematic being > so for understandable reasons. I'd like to submit a patch at some > point (I need to tidy my code and I wouldn't mind knocking a few of > the outlying cases on the head first) - is it acceptable to submit > patches to this list rather than going the bzr route? Hi Lee, I'm answering only this part of your email as I see other people have already addressed most of your points in other emails. I briefly worked on the media scanner myself a few weeks ago, here at Fluendo. We recognized that the media scanner could be improved and started to do some work on it. However, we quickly recognized that any changes in the media scanner have a very high risk of introducing regressions, i.e. improving on the recognition of some files but making it fail to recognize some fails that were recognized before. Because of that, we decided to put on hold all work on the media scanner until we can write a comprehensive battery of unit tests to ensure that we can test the scanner for regressions. I think there's some time scheduled to create these unit tests somewhere in the coming few months, but at the moment i'm not sure exactly when. It would actually help if you could send to me (even in private if you prefer it like that) a list of the file names that you tested the media scanner on ? It will help us build the unit tests when we eventually get to doing that. That said, we already have another community member (in CC) that did some work on the media scanner as well [1][2]. It would be interesting to try and merge the stuff that I did, the stuff that he did and the stuff that you did, then see if any of you can contribute the unit tests as well. After that we will be more than happy to get that stuff reviewed and committed. Clearly it's not exactly a trivial effort we're talking about here, so obviously I understand if you want to just wait for us to do it. Still, you can find the code from the other contributor in the links at bottom, and my code is attached as well just in case you want to do something with it or just look at it for ideas. Speaking of which, if you look at my code you will see it's basically stand-alone. I ripped the scanner off of the moovida code so i could test it more easily (it's all very crude, of course, as it's just a work in progress that got shelved temporarily). But that got me (and other people here) thinking a bit about actually having the media scanner as a totally separate python library, in a way that it can be used not only by moovida, but by other projects as well (besides of the obvious advantages of being much more easily testable, etc). Mind you, we didn't go much further on this other than brainstorming some ideas (e.g. pluggable web-based helpers to search imdb or tmdb or similar sites, pluggable extra filters and recognizers, etc). So having some more discussion about this may be helpful here for everyone interested in improving the media scanner, and maybe someone will want to pick up the idea and run with it for a while. To me a well designed and reusable media scanner library seems like an interesting "summer" project. Hell, if we were participating in Google's Summer of Code this year I would have put it out as a student project for sure. But maybe even without that there's someone interested in looking at it (when they're not spending their time at the beach ;)). Cheers, -- Ugo [1] https://www.moovida.com/quality/review/request/%3ce1mn8ed-0005lh...@myth .home.mattb.net.nz%3E [2] https://code.launchpad.net/~mattbrown/elisa/bugfixes Internal Virus Database is out of date. Checked by AVG - http://www.avg.com Version: 8.0.175 / Virus Database: 270.9.9/1806 - Release Date: 22/11/2008 18:59
