[mwlib] Re: Problems parsing French Wikipedia page

Peter W Tue, 09 Nov 2010 20:53:10 -0800

Hi again,

I'm noticing it to a lesser extent on other pages, without an infinite
loop that kills the parser; just a pile of lines like:


2010-11-10T04:43:20 advtree.warn >> fixTagNodes, unknowntagnode
TagNode tagname=u'abbr' vlist={u'title': u'http://[url]'}->u'abbr'
or
2010-11-10T04:43:20 xmlwriter >> SKIPPED
2010-11-10T04:43:20 xmlwriter >> TagNode
2010-11-10T04:43:20 xmlwriter >> ["parent => Reference tagname='ref'
vlist={u'name': u'JP3'}->'ref'", "tagname => u'abbr'", "caption =>
u'abbr'", "type => 'complex_tag'", "vlist => {u'title': u'http://
[url]'}"]


where the [url]s are actually urls. There are <ref> tags in the
original that seem not to be transformed properly. Can anyone advise
on what to do about that? I'm hypothesizing that the problem in the
French may just be an extension of this. This isn't happening with the
English language one, but is with other languages; could this be
because <ref> is defined in English site-info but isn't over there?

If anyone could advise on what this problem might be and how to solve
it, I'd be most grateful.

Thanks again,

Peter


On Nov 4, 5:07 pm, Peter W <[email protected]> wrote:
> Hi there,
>
> As mentioned in the other post, I'm having some sort of problem
> parsing a French Wikipedia page. The code I've used works well on
> pages in 40 other languages, but not on the French.
>
> Specifically, while it's parsing one of those pages, the console
> starts printing endless warning lines of things like:
>
> 2010-11-04T21:41:59 advtree.warn >> fixTagNodes, unknowntagnode
> TagNode tagname=u'abbr' vlist={u'class': u'abbr', u'title': u'Langue :
> anglais'}->u'abbr
>
> as well as things like
>
> 2010-11-04T21:42:00 xmlwriter >> SKIPPED
> 2010-11-04T21:42:00 xmlwriter >> TagNode
> 2010-11-04T21:42:00 xmlwriter >> ["parent => ArticleLink target=u'1er
> janvier' ns=0", "tagname => u'abbr'", "caption => u'abbr'", "type =>
> 'complex_tag'", "vlist => {u'class': u'abbr', u'title': u'Premier'}"]
>
> Though the amount of warning information being printed means that the
> console dump is massive and I can't scroll back to find it, it even
> claimed that the "maximum recursion depth for Python objects" had been
> exceeded; I thus imagine there is some sort of infinite loop. I tried
> to catch templates that redirect to each other, so I don't think
> that's the problem -- if you think it is, I'll spend more time on
> testing the code I used to prevent that.
>
> Does someone have an idea of what's going on in this case? Is there
> any more information I could provide that would be of help?
>
> Thanks so much,
>
> Peter

-- 
You received this message because you are subscribed to the Google Groups 
"mwlib" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/mwlib?hl=en.

[mwlib] Re: Problems parsing French Wikipedia page

Reply via email to