"Joel Nothman" <[email protected]> writes:
>
> Thanks for clarifying the nature of the parser change. There seem to be a
> number of regressions in link parsing, some related to things I patched
> last year (should I add tests?):
>
> (1) WAS:
>>>> uparser.simpleparse("[[''Donkey'']]")
> Article 'unknown': 1 children
> Paragraph '': 1 children
> Link '': 1 children
> 'Donkey'
> (1) IS:
>>>> uparser.simpleparse("[[Donkey]]")
> Article
> ArticleLink target=u"Donkey" ns=0
>
> I.e. no caption underneath link node which we added for consistency with
> [[aaa|bbb]].
yes, I removed that one. It caused too much problems. I think someone
even complained that he can't distinguish between [[Donkey|Donkey]] and
[[Donkey]].
>
> (2) WAS:
>>>> uparser.simpleparse("[[''Donkey'']]")
> Article 'unknown': 1 children
> Paragraph '': 1 children
> Link '': 1 children
> Style "''": 1 children
> 'Donkey'
> (2) IS:
>>>> uparser.simpleparse("[[''Donkey'']]")
> Article
> ArticleLink target=u"''Donkey''" ns=0
>
strictly speaking [[''Donkey'']] should not even create a
link. Mediawiki outputs
[[Donkey]]
for that input, which seems totally stupid.
> (3) WAS:
>>>> print
>>>> uparser.simpleparse("[[en:Donkey]]").children[0].children[0].target
> Article 'unknown': 1 children
> Paragraph '': 1 children
> LangLink '': 1 children
> 'en:Donkey'
> Donkey
> (3) IS:
>>>> print uparser.simpleparse("[[en:Donkey]]").children[0].target
> Article
> LangLink target=u'en:Donkey' interwiki='en' langlink='English'
> en:Donkey
>
> I.e. we had distinguished between a full_target and the stripped target,
> which just had the title of the target page.
that also didn't work that great. Though I do not remember the exact reasons...
>
> (4) WAS:
>>>> print uparser.simpleparse("[[Donkey]]s")
> Article 'unknown': 1 children
> Paragraph '': 2 children
> ArticleLink '': 1 children
> 'Donkeys'
> (4) IS:
>>>> print uparser.simpleparse("[[Donkey]]s")
> Article
> ArticleLink target=u'Donkey' ns=0
> u's'
>
> This *is* considered in an existing text case, though that test is marked
> to fail.
>
That was intentional. I cannot pull in that "s" character because of
(1).
>
> Are these regressions intentional? Or are they side-effects of some other
> change? Should I reimplement the features?
>
I don't miss them....
> Also note that previously parsing "Some string" would automatically create
> a paragraph node. Now in order to create a paragraph node, "\n\n" needs to
> be present (and "\n\nText" erroneously creates two paragraph nodes).
>
that's a side effect of the new implementation. Paragraphs are only
created on \n\n or around block nodes. I'm not quite sure if I consider
the second case an error.
mediawiki renders "<div>\n\nbla</div>" as:
<div>
<p>bla</p>
</div>
whereas it renders
<div>bla</div>
as
<div>bla</div>
> It would seem that more of a test-driven development might be helpful.
>
we have around 600 tests, but could probably need more.
Regards,
- Ralf
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups
"mwlib" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to [email protected]
For more options, visit this group at http://groups.google.com/group/mwlib?hl=en
-~----------~----~----~----~------~----~------~--~---