Re: [backstage] Plain text or easy-to-parse news articles

2007-07-27 Thread Steve Jolly
Liam S Docherty wrote: the current format of news articles do not parse well at all, not to mention are rather difficult to extract from the surrounding mark-up code. The simplified version suffers from the same problem, so I was wondering are there any nice html versions or even plain text

Re: [backstage] Plain text or easy-to-parse news articles

2007-07-27 Thread ~:'' ありがとうございました 。
Liam, I'm having a similar issue in that I wish to parse to SVG, and this could be easier... in fact in large part it's a problem with the specifications... cheers Jonathan Chetwynd On 27 Jul 2007, at 09:48, Liam S Docherty wrote: The low graphics version sufffer from the same problem,

Re: [backstage] Plain text or easy-to-parse news articles

2007-07-27 Thread Liam S Docherty
The low graphics version sufffer from the same problem, in that the html is not considered well formed by the standard Java parsers. I suppose I could try tidy up the html before parsing =) Thanks Liam Have you looked at the low graphics version? eg

Re: [backstage] Plain text or easy-to-parse news articles

2007-07-27 Thread Matthew Somerville
Liam S Docherty wrote: The low graphics version sufffer from the same problem, in that the html is not considered well formed by the standard Java parsers. I suppose I could try tidy up the html before parsing =) In this situation, I'd always suggest BeautifulSoup, but I'm afraid that's

Re: [backstage] Plain text or easy-to-parse news articles

2007-07-27 Thread Kim Plowright
I understand that the BBC tracks external links in order to provide stats to respond to the Graf report's requirement for the BBC to link externally more often, and become part of the web. That's done with something in the footer, which automagically rewrites external links to have go tracking

Re: [backstage] Plain text or easy-to-parse news articles

2007-07-27 Thread Jason Cartwright
I'd imagine stats on which story is clicked is quite valuable, particularly when moreover are ranking the stories. I understand that the BBC tracks external links in order to provide stats to respond to the Graf report's requirement for the BBC to link externally more often, and become part of

RE: [backstage] Plain text or easy-to-parse news articles

2007-07-27 Thread Jeremy Stone
Is anyone aware of any reason why they do not link directly to the story on the relevant site instead? The journalists working on the relevant news story pick the related link to publish alongside their piece. However they use a tool to help them in this task where stories are suggested to

Re: [backstage] Plain text or easy-to-parse news articles

2007-07-27 Thread Sean Dillon
http://news.bbc.co.uk/1/low/world/americas/6918490.stm I know this is totally off topic but I notice that the links to external stories are actually being redirected through moreover.com rather than link directly to the site in question (even if it does go through the internal Beeb redirect

RE: [backstage] Plain text or easy-to-parse news articles

2007-07-27 Thread Chris Yanda
-Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Steve Jolly Sent: 27 July 2007 08:42 To: backstage@lists.bbc.co.uk Subject: Re: [backstage] Plain text or easy-to-parse news articles Liam S Docherty wrote: the current format of news articles do