I'm getting the impression there is a fundamental misunderstanding here.

Am 18.05.2014 04:28, schrieb Subramanya Sastry:
> So, consider this wikitext for page P.
> 
> == Foo ==
> {{wikitext-transclusion}}
>   *a1
> <map ..> ... </map>
>   *a2
> {{T}} (the html-content-model-transclusion)
>   *a3
> 
> Parsoid gets wikitext from the API for {{wikitext-transclusion}}, parses it 
> and
> injects the tokens into the P's content. Parsoid gets HTML from the API for
> <map..>...</map> and injects the HTML into the not-fully-processed wikitext 
> of P
> (by adding an appropriate token wrapper). So, if {{T}} returns HTML (i.e. the 
> MW
> API lets Parsoid know that it is HTML), Parsoid can inject the HTML into the
> not-fully-processed wikitext and ensure that the final output comes out right
> (in this case, the HTML from both the map extension and {{T}} would not get
> sanitized as it should be).
> 
> Does that help explain why we said we don't need the html wrapper?

No, it actually misses my point completely. My point is that this may work with
the way parsoid uses expandtemplates, but it does not work for expandtemplates
in general. Because expandtemplates takes full wikitext as input, and only
partially replaces it.

So, let me phrase it this way:

If expandtemplates is called with text=

   == Foo ==
   {{T}}

   [[Category:Bla]]

What should it return, and what content type should be declared in the http 
header?

Note that I'm not talking about how parsoid processes this text. That's not my
point - my point is that expandtemplates can be and is used on full wikitext. In
that context, the return type cannot be HTML.

> All that said, if you want to provide the wrapper with <html model="whatever"
> ....>fully-expanded-HTML</html>, we can handle that as well. We'll use the 
> model
> attribute of the wrapper, discard the wrapper and use the contents in our 
> pipeline.

Why use the model attribute? Why would you care about the original model? All
you need to know is that you'll get HTML. Exposing the original model in this
context seems useless if not misleading. <html transclude="{{T}}></html> would
give that backend parser a way to discard the HTML (as unsafe) and execute the
transclusion instead (generating trusted HTML). In fact, we could just omit the
content of the <html> tag.

> So, model information either as an attribute on the wrapper, api response
> header, or a property in the JSON/XML response structure would all work for 
> us.

As explained above, the return type cannot be HTML for the full text, because
any "plain" wikitext would stay unprocessed. There needs to be a marker for
"html transclusion *here*" in the text.

Am 18.05.2014 16:29, schrieb Gabriel Wicke:
> The difference between wrapper and property is actually that using inline
> wrappers in the returned wikitext would force us to escape similar wrappers
> from normal template content to avoid opening a gaping XSS hole.

Please explain, I do not see the hole you mention.

If the input contained <html>evil stuff</html>, it would just get escaped by the
preprocessor (unless $wgRawHtml is enabled), as it is now:
https://de.wikipedia.org/w/api.php?action=expandtemplates&text=%3Chtml%3E%3Cscript%3Ealert%28%27evil%27%29%3C/script%3E%3C/html%3E

If <html transclude="{{T}}"> was passed, the parser/preprocessor would treat it
like it would treat {{T}} - it would get trusted, backend generated HTML from
respective Content object.

I see no change, and no opportunity to inject anything. Am I missing something?

> A separate property in the JSON/XML structure avoids the need for escaping
> (and associated security risks if not done thoroughly), and should be
> relatively straightforward to implement and consume.

As explained above, I do not see how this would work except for the very special
case of using expandtemplates to expand just a single template. This could be
solved by introducing a new, single template mode for expandtemplates, e.g.
using expand="Foo|x|y|z" instead of text="{{Foo|x|y|z}}".

Another way would be to use hints the structure returned by generatexml. There,
we have an opportunity to declare a content type for a *part* of the output (or
rather, input).

-- daniel

_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Reply via email to