Liam S Docherty wrote:
the current format of news articles do not
parse well at all, not to mention are rather difficult to extract from the
surrounding mark-up code. The simplified version suffers from the same
problem, so I was wondering are there any nice html versions or even plain
text
Liam,
I'm having a similar issue in that I wish to parse to SVG, and this
could be easier...
in fact in large part it's a problem with the specifications...
cheers
Jonathan Chetwynd
On 27 Jul 2007, at 09:48, Liam S Docherty wrote:
The low graphics version sufffer from the same problem,
The low graphics version sufffer from the same problem, in that the html
is not considered well formed by the standard Java parsers. I suppose I
could try tidy up the html before parsing =)
Thanks
Liam
Have you looked at the low graphics version? eg
Liam S Docherty wrote:
The low graphics version sufffer from the same problem, in that the html
is not considered well formed by the standard Java parsers. I suppose I
could try tidy up the html before parsing =)
In this situation, I'd always suggest BeautifulSoup, but I'm afraid that's
I understand that the BBC tracks external links in order to provide stats to
respond to the Graf report's requirement for the BBC to link externally more
often, and become part of the web.
That's done with something in the footer, which automagically rewrites
external links to have go tracking
I'd imagine stats on which story is clicked is quite valuable, particularly
when moreover are ranking the stories.
I understand that the BBC tracks external links in order to provide stats to
respond to the Graf report's requirement for the BBC to link externally more
often, and become part of
Is anyone aware of any reason why they do not link directly to the story
on the relevant site instead?
The journalists working on the relevant news story pick the related link to
publish alongside their piece. However they use a tool to help them in this
task where stories are suggested to
http://news.bbc.co.uk/1/low/world/americas/6918490.stm
I know this is totally off topic but I notice that the links to external
stories are actually being redirected through moreover.com rather than
link directly to the site in question (even if it does go through the
internal Beeb redirect
-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Steve Jolly
Sent: 27 July 2007 08:42
To: backstage@lists.bbc.co.uk
Subject: Re: [backstage] Plain text or easy-to-parse news articles
Liam S Docherty wrote:
the current format of news articles do
9 matches
Mail list logo