RE: A few questions

Lee Jackson Tue, 11 Aug 2009 04:35:00 -0700

Thanks for the response, links and the info Ugo.

Theres some good code scattered around there and quite a few sensible
strategies for correctly identifying content. I think my code is
hovering in the region of 90-95% of correctly identified content and if
I incorporated the other strategies/heuristics that should hit 95-99%,
so I'm quite happy to have a shot at integrating that over the next week
or two. If I can do that and create a good set of test cases and
expanded tests I'll submit the code. That said, my free time is sadly
very erratic (hence why I'm replying to this post nearly 2 weeks after
receiving it) so I never really know when I'll get time to work on these
sort of projects.


In reference to an expanded media scanner - I wish I had the time,
although I would question if it's an area where that level of effort
would be best spent. Theres some basic strategies that can be used to
identify the vast majority of content and from there you are talking
about ever increasing efforts for minimal improvements. Indeed looking
at where I currently have content mismatches or holes they mainly come
down to :

- cases where the naming of the content is so poor no automated parsing
would ever work without sophisticated AI actually watching the content.
- case where the content is not in thetvdb or themoviedb.
- cases where searching either of those sources returns invalid results.
- cases where the underlying database structure cannot handle the "file
format" e.g. Multiple VOBs or CD1,CD2.

It is actually the latter case which is far more serious than any of the
others since I have many items appearing in the menus multiple times due
to being spread over multiple CDs/DVDs. Is there something in the works
to allow better "grouping" of this sort of content?

Anyways, touch wood I'll have a few more days to look at this soon; I
had a look at the Fluendo agreement though and I must admit I have some
reservations about signing. While its great that the project is open
source it is quite offputting to say the least....

Lee


-----Original Message-----
From: [email protected]
[mailto:[email protected]] On Behalf Of Ugo Riboni
Sent: 27 July 2009 10:52
To: Moovida developers list
Subject: Re: A few questions

> The above notwithstanding, I've spent a few days hacking the 
> video_parser.py in-order to get my library recognised. It's an 
> assortment of anime, TV and movies coming from various sources and
> **probably** typical of most users content. I've gone from the 1.05 
> version having 10-20% of the content being recognised and added to the

> movie/tv show libraries (with 80% uncategorised), to @90% being 
> categorised with most of the content that is being problematic  being 
> so for understandable reasons. I'd like to submit a patch at some 
> point (I need to tidy my code and I wouldn't mind knocking a few of 
> the outlying cases on the head first) - is it acceptable to submit 
> patches to this list rather than going the bzr route?

Hi Lee,
I'm answering only this part of your email as I see other people have
already addressed most of your points in other emails.

I briefly worked on the media scanner myself a few weeks ago, here at
Fluendo. We recognized that the media scanner could be improved and
started to do some work on it. However, we quickly recognized that any
changes in the media scanner have a very high risk of introducing
regressions, i.e. improving on the recognition of some files but making
it fail to recognize some fails that were recognized before.

Because of that, we decided to put on hold all work on the media scanner
until we can write a comprehensive battery of unit tests to ensure that
we can test the scanner for regressions.
I think there's some time scheduled to create these unit tests somewhere
in the coming few months, but at the moment i'm not sure exactly when.
It would actually help if you could send to me (even in private if you
prefer it like that) a list of the file names that you tested the media
scanner on ? It will help us build the unit tests when we eventually get
to doing that.

That said, we already have another community member (in CC) that did
some work on the media scanner as well [1][2].
It would be interesting to try and merge the stuff that I did, the stuff
that he did and the stuff that you did, then see if any of you can
contribute the unit tests as well.
After that we will be more than happy to get that stuff reviewed and
committed.

Clearly it's not exactly a trivial effort we're talking about here, so
obviously I understand if you want to just wait for us to do it.
Still, you can find the code from the other contributor in the links at
bottom, and my code is attached as well just in case you want to do
something with it or just look at it for ideas.

Speaking of which, if you look at my code you will see it's basically
stand-alone. I ripped the scanner off of the moovida code so i could
test it more easily (it's all very crude, of course, as it's just a work
in progress that got shelved temporarily).

But that got me (and other people here) thinking a bit about actually
having the media scanner as a totally separate python library, in a way
that it can be used not only by moovida, but by other projects as well
(besides of the obvious advantages of being much more easily testable,
etc).

Mind you, we didn't go much further on this other than brainstorming
some ideas (e.g. pluggable web-based helpers to search imdb or tmdb or
similar sites, pluggable extra filters and recognizers, etc).
So having some more discussion about this may be helpful here for
everyone interested in improving the media scanner, and maybe someone
will want to pick up the idea and run with it for a while.

To me a well designed and reusable media scanner library seems like an
interesting "summer" project. Hell, if we were participating in Google's
Summer of Code this year I would have put it out as a student project
for sure. But maybe even without that there's someone interested in
looking at it (when they're not spending their time at the beach ;)).

Cheers,
--
Ugo

[1]
https://www.moovida.com/quality/review/request/%3ce1mn8ed-0005lh...@myth
.home.mattb.net.nz%3E

[2] https://code.launchpad.net/~mattbrown/elisa/bugfixes

Internal Virus Database is out of date.
Checked by AVG - http://www.avg.com
Version: 8.0.175 / Virus Database: 270.9.9/1806 - Release Date:
22/11/2008 18:59

RE: A few questions

Reply via email to