[mwlib] Re: Problems parsing French Wikipedia page

Peter W Wed, 10 Nov 2010 16:52:56 -0800

Hi there,

I may have posted the previous prematurely; apologies. I've looked
into it substantially more and there appears to be a bug for the
"abbr" tag. It's defined as valid in MediaWiki (http://
meta.wikimedia.org/wiki/Help:HTML_in_wikitext), and is recognized in
uparser (line 238). However, it is not present in advtree, unlike in
uparser.


I tried adding it (line 549 of advtree) as:

class Abbreviation(TagNode, AdvancedNode):
        _tag = "abbr"

_tagNodeMap = dict( (k._tag,k) for k in [Source, Code, BreakingReturn,
HorizontalRule, Index, Teletyped, Reference, ReferenceList, Gallery,
Center, Div, Span, Strike, ImageMap, Ruby, RubyBase, RubyText,
Deleted, Inserted, TableCaption, Font, DefinitionList, DefinitionTerm,
DefinitionDescription, Abbreviation] )

but as far as I could tell that didn't fix the problem. At this stage,
I'm stuck, and don't know how to debug it further. I'm pretty sure
this is a bug, but don't know how to open a ticket (else I would just
do so and not post this here).

Cheers,

Peter

On Nov 9, 11:53 pm, Peter W <[email protected]> wrote:
> Hi again,
>
> I'm noticing it to a lesser extent on other pages, without an infinite
> loop that kills the parser; just a pile of lines like:
>
> 2010-11-10T04:43:20 advtree.warn >> fixTagNodes, unknowntagnode
> TagNode tagname=u'abbr' vlist={u'title': u'http://[url]'}->u'abbr'
> or
> 2010-11-10T04:43:20 xmlwriter >> SKIPPED
> 2010-11-10T04:43:20 xmlwriter >> TagNode
> 2010-11-10T04:43:20 xmlwriter >> ["parent => Reference tagname='ref'
> vlist={u'name': u'JP3'}->'ref'", "tagname => u'abbr'", "caption =>
> u'abbr'", "type => 'complex_tag'", "vlist => {u'title': u'http://
> [url]'}"]
>
> where the [url]s are actually urls. There are <ref> tags in the
> original that seem not to be transformed properly. Can anyone advise
> on what to do about that? I'm hypothesizing that the problem in the
> French may just be an extension of this. This isn't happening with the
> English language one, but is with other languages; could this be
> because <ref> is defined in English site-info but isn't over there?
>
> If anyone could advise on what this problem might be and how to solve
> it, I'd be most grateful.
>
> Thanks again,
>
> Peter
>
> On Nov 4, 5:07 pm, Peter W <[email protected]> wrote:
>
> > Hi there,
>
> > As mentioned in the other post, I'm having some sort of problem
> > parsing a French Wikipedia page. The code I've used works well on
> > pages in 40 other languages, but not on the French.
>
> > Specifically, while it's parsing one of those pages, the console
> > starts printing endless warning lines of things like:
>
> > 2010-11-04T21:41:59 advtree.warn >> fixTagNodes, unknowntagnode
> > TagNode tagname=u'abbr' vlist={u'class': u'abbr', u'title': u'Langue :
> > anglais'}->u'abbr
>
> > as well as things like
>
> > 2010-11-04T21:42:00 xmlwriter >> SKIPPED
> > 2010-11-04T21:42:00 xmlwriter >> TagNode
> > 2010-11-04T21:42:00 xmlwriter >> ["parent => ArticleLink target=u'1er
> > janvier' ns=0", "tagname => u'abbr'", "caption => u'abbr'", "type =>
> > 'complex_tag'", "vlist => {u'class': u'abbr', u'title': u'Premier'}"]
>
> > Though the amount of warning information being printed means that the
> > console dump is massive and I can't scroll back to find it, it even
> > claimed that the "maximum recursion depth for Python objects" had been
> > exceeded; I thus imagine there is some sort of infinite loop. I tried
> > to catch templates that redirect to each other, so I don't think
> > that's the problem -- if you think it is, I'll spend more time on
> > testing the code I used to prevent that.
>
> > Does someone have an idea of what's going on in this case? Is there
> > any more information I could provide that would be of help?
>
> > Thanks so much,
>
> > Peter
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"mwlib" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/mwlib?hl=en.

[mwlib] Re: Problems parsing French Wikipedia page

Reply via email to