Hi,

Inspired by 

http://redgreenblu.com/svn/projects/assert_valid_markup/lib/assert_valid_markup.rb

I would like to have an easy way to validate the html on the page IE is
currently showing. Unfortunately, I have a problem with the html that
ie.document.body.parentelement.outerhtml outputs :-(

Take a look at the following example:

require 'test/unit'
require 'watir'
require 'net/http'
require 'cgi'
require 'xmlsimple'

class ValidationExample < Test::Unit::TestCase
  include Watir
  
  def test_w3c_validate
    ie = IE.new
    ie.goto 'validator.w3.org/'
    html = ie.document.body.parentelement.outerhtml
    response = Net::HTTP.start('validator.w3.org').post2('/check', 
"fragment=#{CGI.escape(html)}&output=xml")
    markup_is_valid = response['x-w3c-validator-status']=='Valid'
    message = markup_is_valid ? '' :  
XmlSimple.xml_in(response.body)['messages'][0]['msg'].collect{ |m| "Invalid 
markup: line #{m['line']}: #{CGI.unescapeHTML(m['content'])}" }.join("\n")
    assert markup_is_valid, message
    ie.close
  end
  
end

When I run the example I get stuff like:

Invalid markup: line 1: no document type declaration; implying "<!DOCTYPE HTML 
SYSTEM>"
Invalid markup: line 1: there is no attribute "XML:LANG"
Invalid markup: line 1: there is no attribute "XMLNS"

The html returned by ie.document.body.parentelement.outerhtml is

<HTML lang=en xml:lang="en" xmlns="http://www.w3.org/1999/xhtml";>
<HEAD>
  <TITLE>The W3C Markup Validation Service</TITLE>
  <LINK rev=made href="mailto:[EMAIL PROTECTED]">
  <LINK title="Home Page" rev=start href="./">

but if I view the source from IE itself it is something like

<?xml version="1.0"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
    "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd";>
<html xmlns="http://www.w3.org/1999/xhtml"; xml:lang="en" lang="en">
  <head>
    <title>The W3C Markup Validation Service</title>
    <link rev="made" href="mailto:[EMAIL PROTECTED]" />
    <link rev="start" href="./" title="Home Page" />
...

The DOCTYPE line and several quotes are missing. Is there any
way to get the unmodified html for the current page? 

If people are doing automatic validation any other way I am open
to suggestions.

Best regards,

Jørgen

_______________________________________________
Wtr-general mailing list
[email protected]
http://rubyforge.org/mailman/listinfo/wtr-general

Reply via email to