On Wed, Jun 4, 2008 at 10:55 AM, Brian Butterworth <[EMAIL PROTECTED]> wrote: > I thought they were trying to do OCR on the captions from the DVB-T stream. > > What I was saying was that the old Freeview version of BBC Parliament used > to have a quarter-screen picture and the information that is now in the > Astons was provided using MHEG5. This was clear text (to keep the bandwidth > down) not bitmap graphics.
Forgive my ignorance, but what is an Aston? > OCRing is never going to be brilliant, given the semi-transparent nature of > the captions on BBC Parliament. > > However, a clear text feed of the data would keep the data pure, surely? The machines that put the captions up on the screen have internal text-based logs, to which we have access. However, since this is basically just pulling logfiles off a set of operational machines this access isn't 100% reliable. The data in the log files is of variable quality, since there are some speeches that are not captioned, and other times captions aren't actually speeches (e.g. reaction shot of previous speaker during a long speech can prompt a back and forth of captions, even though the same person is speaking throughout the changeover in captions). So although we use the logfiles to get an approximate fix, we had to resort to the timestamping game for accuracy. Hope that helps, -- etienne - Sent via the backstage.bbc.co.uk discussion group. To unsubscribe, please visit http://backstage.bbc.co.uk/archives/2005/01/mailing_list.html. Unofficial list archive: http://www.mail-archive.com/[email protected]/

