Re: [whatwg] !DOCTYPE htmlbodytablemathmifoo/mi/math/table

2011-12-15 Thread Henri Sivonen
On Tue, Dec 13, 2011 at 4:23 AM, Adam Barth w...@adambarth.com wrote:
 I'm trying to understand how the HTML parsing spec handles the following case:

 !DOCTYPE htmlbodytablemathmifoo/mi/math/table

 According to the html5lib test data, we should parse that as follows:

 | !DOCTYPE html
 | html
 |   head
 |   body
 |     math math
 |       math mi
 |         foo
 |     table

The expectation of the test case makes sense.

 However, I'm not sure whether that's what the spec actually does.

I think that's a spec bug.

 The net result of which is popping the stack of open elements, but not
 flushing out the pending table character tokens list.

The reason why Gecko does what makes sense is that Gecko uses a text
accumulation buffer for non-table cases, too, and any tag token
flushes the buffer. (Not quite optimal for ignored tags, sure.)

-- 
Henri Sivonen
hsivo...@iki.fi
http://hsivonen.iki.fi/


Re: [whatwg] !DOCTYPE htmlbodytablemathmifoo/mi/math/table and other parser questions

2011-12-14 Thread Adam Barth
On Tue, Dec 13, 2011 at 2:32 PM, Ian Hickson i...@hixie.ch wrote:
 On Mon, 12 Dec 2011, Adam Barth wrote:
 I'm trying to understand how the HTML parsing spec handles the following 
 case:

 !DOCTYPE htmlbodytablemathmifoo/mi/math/table

 According to the html5lib test data, we should parse that as follows:

 | !DOCTYPE html
 | html
 |   head
 |   body
 |     math math
 |       math mi
 |         foo
 |     table

 However, I'm not sure whether that's what the spec actually does.

 Consider point at which we parse the f character token (from foo).
  The insertion mode will be in table.  The spec will execute as
 follows:

 - If the current node is a MathML text integration point and the
 token is a character token
   * Process the token according to the rules given in the section
 corresponding to the current insertion mode in HTML content.

 - A character token
   * Let the pending table character tokens be an empty list of tokens.
   * Let the original insertion mode be the current insertion mode.
   * Switch the insertion mode to in table text and reprocess the token.

 - Any other character token
   * Append the character token to the pending table character tokens list.

 ... the o and o will be processed similarly and end up in the
 pending table character tokens list.

 Now, consider the /mi token.  We're still at a MathML text
 integration point, but the current token is neither a start token
 (with certain names) nor a character token, so we process the token
 according to the rules given in the section for parsing tokens in
 foreign content.

 - Any other end tag
   * Run these steps:
     ...

 The net result of which is popping the stack of open elements, but not
 flushing out the pending table character tokens list.  The list will
 eventually be flushed when we process the /table token, resulting
 these character tokens getting foster parented:

 | !DOCTYPE html
 | html
 |   head
 |   body
 |     math math
 |       math mi
 |     foo
 |     table

 On Tue, 18 Oct 2011, David Flanagan wrote:

 Here's my current workaround:

 In 13.2.5, in the rules for whether to use the current insertion mode or
 to insert the token as foreign content, if the token is being inserted
 because the current node is a math (or HTML, but I'm not sure about
 that) integration point, then first set a text_integration_mode flag,
 then invoke the current insertion mode, then clear the flag.

 And in the in table insertion mode, when a character token is inserted,
 and the text_integration_mode flag is set, then just process the token
 using in body mode, and otherwise follow the directions that are there
 now.

 I'm not sure that is the best way to fix the spec, but it works for me,
 in the sense that my parser now passes the tests.

 I think the real problem is that there's no need to go into the table
 text mode if the current node is not a table model element. So I've
 changed the spec at that point.

 Please let me know if that doesn't fix the test case or causes any other
 regressions.

That fix seems to work great.

Thanks!
Adam


[whatwg] !DOCTYPE htmlbodytablemathmifoo/mi/math/table and other parser questions

2011-12-13 Thread Ian Hickson
On Fri, 14 Oct 2011, David Flanagan wrote:

 The Anything else case of the in_table insertion mode of the HTML parsing
 spec reads:
  Process the token using the rules for the in body insertion mode, except
  that if the current node is a table, tbody, tfoot, thead, or tr element,
  then, whenever a node would be inserted into the current node, it must
  instead be foster parented.
 I think that this is actually incorrect (or at least very misleading) as it is
 worded.  In order to get correct parsing results, it appears that you have to
 do this:
 
 Process the token using the rules for the in body insertion mode, except
 that whenever a node would be inserted into the current node and the current
 node is a table, tbody, tfoot, thead, or tr element, then the node to be
 inserted must instead be foster parented.
 
 As the spec is currently worded, we are directed to check once whether the
 current node is a table, table section or table row, and then proceed to use
 the rules for the in body mode.  In fact, however, it is necessary to check
 whether the current node is a table, section or row each time a node is to be
 inserted.  This came up for me when a text node is being inserted into a table
 when there is an active formatting element that gets reconstructed and foster
 parented.  My reading of the current spec text said that the text node should
 also be foster parented (because I only checked whether the current node was a
 table once), and the text node ended up as a sibling of the active formatting
 element rather than a child of that element.

Agreed that the previous wording was misleading. I've adjusted it. Let me 
know if you think it's still bad.


On Mon, 12 Dec 2011, Adam Barth wrote:

 I'm trying to understand how the HTML parsing spec handles the following case:
 
 !DOCTYPE htmlbodytablemathmifoo/mi/math/table
 
 According to the html5lib test data, we should parse that as follows:
 
 | !DOCTYPE html
 | html
 |   head
 |   body
 | math math
 |   math mi
 | foo
 | table
 
 However, I'm not sure whether that's what the spec actually does.
 
 Consider point at which we parse the f character token (from foo).
  The insertion mode will be in table.  The spec will execute as
 follows:
 
 - If the current node is a MathML text integration point and the
 token is a character token
   * Process the token according to the rules given in the section
 corresponding to the current insertion mode in HTML content.
 
 - A character token
   * Let the pending table character tokens be an empty list of tokens.
   * Let the original insertion mode be the current insertion mode.
   * Switch the insertion mode to in table text and reprocess the token.
 
 - Any other character token
   * Append the character token to the pending table character tokens list.
 
 ... the o and o will be processed similarly and end up in the
 pending table character tokens list.
 
 Now, consider the /mi token.  We're still at a MathML text
 integration point, but the current token is neither a start token
 (with certain names) nor a character token, so we process the token
 according to the rules given in the section for parsing tokens in
 foreign content.
 
 - Any other end tag
   * Run these steps:
 ...
 
 The net result of which is popping the stack of open elements, but not
 flushing out the pending table character tokens list.  The list will
 eventually be flushed when we process the /table token, resulting
 these character tokens getting foster parented:
 
 | !DOCTYPE html
 | html
 |   head
 |   body
 | math math
 |   math mi
 | foo
 | table

On Tue, 18 Oct 2011, David Flanagan wrote:

 Here's my current workaround:
 
 In 13.2.5, in the rules for whether to use the current insertion mode or 
 to insert the token as foreign content, if the token is being inserted 
 because the current node is a math (or HTML, but I'm not sure about 
 that) integration point, then first set a text_integration_mode flag, 
 then invoke the current insertion mode, then clear the flag.
 
 And in the in table insertion mode, when a character token is inserted, 
 and the text_integration_mode flag is set, then just process the token 
 using in body mode, and otherwise follow the directions that are there 
 now.
 
 I'm not sure that is the best way to fix the spec, but it works for me, 
 in the sense that my parser now passes the tests.

I think the real problem is that there's no need to go into the table 
text mode if the current node is not a table model element. So I've 
changed the spec at that point.

Please let me know if that doesn't fix the test case or causes any other 
regressions.

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'


[whatwg] !DOCTYPE htmlbodytablemathmifoo/mi/math/table

2011-12-12 Thread Adam Barth
I'm trying to understand how the HTML parsing spec handles the following case:

!DOCTYPE htmlbodytablemathmifoo/mi/math/table

According to the html5lib test data, we should parse that as follows:

| !DOCTYPE html
| html
|   head
|   body
| math math
|   math mi
| foo
| table

However, I'm not sure whether that's what the spec actually does.

Consider point at which we parse the f character token (from foo).
 The insertion mode will be in table.  The spec will execute as
follows:

- If the current node is a MathML text integration point and the
token is a character token
  * Process the token according to the rules given in the section
corresponding to the current insertion mode in HTML content.

- A character token
  * Let the pending table character tokens be an empty list of tokens.
  * Let the original insertion mode be the current insertion mode.
  * Switch the insertion mode to in table text and reprocess the token.

- Any other character token
  * Append the character token to the pending table character tokens list.

... the o and o will be processed similarly and end up in the
pending table character tokens list.

Now, consider the /mi token.  We're still at a MathML text
integration point, but the current token is neither a start token
(with certain names) nor a character token, so we process the token
according to the rules given in the section for parsing tokens in
foreign content.

- Any other end tag
  * Run these steps:
...

The net result of which is popping the stack of open elements, but not
flushing out the pending table character tokens list.  The list will
eventually be flushed when we process the /table token, resulting
these character tokens getting foster parented:

| !DOCTYPE html
| html
|   head
|   body
| math math
|   math mi
| foo
| table

Thanks,
Adam


Re: [whatwg] !DOCTYPE htmlbodytablemathmifoo/mi/math/table

2011-12-12 Thread David Flanagan
I think this is the same problem I reported here: 
http://lists.whatwg.org/htdig.cgi/whatwg-whatwg.org/2011-October/033533.html
See Hixie's response to that message.  I think this is a known problem, though 
I don't know if a bug has been filed on it.

David

- Original Message -
From: Adam Barth w...@adambarth.com
To: whatwg whatwg@lists.whatwg.org
Cc: Henri Sivonen hsivo...@iki.fi
Sent: Monday, December 12, 2011 6:23:23 PM
Subject: [whatwg] !DOCTYPE 
htmlbodytablemathmifoo/mi/math/table

I'm trying to understand how the HTML parsing spec handles the following case:

!DOCTYPE htmlbodytablemathmifoo/mi/math/table

According to the html5lib test data, we should parse that as follows:

| !DOCTYPE html
| html
|   head
|   body
| math math
|   math mi
| foo
| table

However, I'm not sure whether that's what the spec actually does.

Consider point at which we parse the f character token (from foo).
 The insertion mode will be in table.  The spec will execute as
follows:

- If the current node is a MathML text integration point and the
token is a character token
  * Process the token according to the rules given in the section
corresponding to the current insertion mode in HTML content.

- A character token
  * Let the pending table character tokens be an empty list of tokens.
  * Let the original insertion mode be the current insertion mode.
  * Switch the insertion mode to in table text and reprocess the token.

- Any other character token
  * Append the character token to the pending table character tokens list.

... the o and o will be processed similarly and end up in the
pending table character tokens list.

Now, consider the /mi token.  We're still at a MathML text
integration point, but the current token is neither a start token
(with certain names) nor a character token, so we process the token
according to the rules given in the section for parsing tokens in
foreign content.

- Any other end tag
  * Run these steps:
...

The net result of which is popping the stack of open elements, but not
flushing out the pending table character tokens list.  The list will
eventually be flushed when we process the /table token, resulting
these character tokens getting foster parented:

| !DOCTYPE html
| html
|   head
|   body
| math math
|   math mi
| foo
| table

Thanks,
Adam


Re: [whatwg] !DOCTYPE htmlbodytablemathmifoo/mi/math/table

2011-12-12 Thread Adam Barth
Yes, that's the same issue.  It appears to be fallout from removing
the in foreign content insertion mode.

Adam


On Mon, Dec 12, 2011 at 7:36 PM, David Flanagan dflana...@mozilla.com wrote:
 I think this is the same problem I reported here: 
 http://lists.whatwg.org/htdig.cgi/whatwg-whatwg.org/2011-October/033533.html
 See Hixie's response to that message.  I think this is a known problem, 
 though I don't know if a bug has been filed on it.

    David

 - Original Message -
 From: Adam Barth w...@adambarth.com
 To: whatwg whatwg@lists.whatwg.org
 Cc: Henri Sivonen hsivo...@iki.fi
 Sent: Monday, December 12, 2011 6:23:23 PM
 Subject: [whatwg] !DOCTYPE     
 htmlbodytablemathmifoo/mi/math/table

 I'm trying to understand how the HTML parsing spec handles the following case:

 !DOCTYPE htmlbodytablemathmifoo/mi/math/table

 According to the html5lib test data, we should parse that as follows:

 | !DOCTYPE html
 | html
 |   head
 |   body
 |     math math
 |       math mi
 |         foo
 |     table

 However, I'm not sure whether that's what the spec actually does.

 Consider point at which we parse the f character token (from foo).
  The insertion mode will be in table.  The spec will execute as
 follows:

 - If the current node is a MathML text integration point and the
 token is a character token
  * Process the token according to the rules given in the section
 corresponding to the current insertion mode in HTML content.

 - A character token
  * Let the pending table character tokens be an empty list of tokens.
  * Let the original insertion mode be the current insertion mode.
  * Switch the insertion mode to in table text and reprocess the token.

 - Any other character token
  * Append the character token to the pending table character tokens list.

 ... the o and o will be processed similarly and end up in the
 pending table character tokens list.

 Now, consider the /mi token.  We're still at a MathML text
 integration point, but the current token is neither a start token
 (with certain names) nor a character token, so we process the token
 according to the rules given in the section for parsing tokens in
 foreign content.

 - Any other end tag
  * Run these steps:
    ...

 The net result of which is popping the stack of open elements, but not
 flushing out the pending table character tokens list.  The list will
 eventually be flushed when we process the /table token, resulting
 these character tokens getting foster parented:

 | !DOCTYPE html
 | html
 |   head
 |   body
 |     math math
 |       math mi
 |     foo
 |     table

 Thanks,
 Adam