Re: [Haskell-cafe] HTML library with DOM?

2010-10-07 Thread Edward Z. Yang
Excerpts from Gregory Collins's message of Wed Oct 06 19:44:44 -0400 2010:
 I've got the month of October off, and one of the things I've been
 planning on working on is a compliant HTML5 parser for Haskell --
 something which is sorely needed! I will ping the list back if/when I
 get it finished.

I've heard that some of the existing HTML parsers in Haskell were
already HTML5 compliant (this topic came up when I was complaining
that there were some algorithms that you absolutely had to have
state for, because that was how they were specified.)  I never
verified this assertion though.

Edward
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] HTML library with DOM?

2010-10-07 Thread Gregory Collins
Edward Z. Yang ezy...@mit.edu writes:

 Excerpts from Gregory Collins's message of Wed Oct 06 19:44:44 -0400 2010:
 I've got the month of October off, and one of the things I've been
 planning on working on is a compliant HTML5 parser for Haskell --
 something which is sorely needed! I will ping the list back if/when I
 get it finished.

 I've heard that some of the existing HTML parsers in Haskell were
 already HTML5 compliant (this topic came up when I was complaining
 that there were some algorithms that you absolutely had to have
 state for, because that was how they were specified.)  I never
 verified this assertion though.

If there's already a library which *correctly* parses html5 documents
into DOM trees, could someone please let me know so I can use it instead
of wasting a bunch of time writing one?

Thanks,

G
-- 
Gregory Collins g...@gregorycollins.net
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] HTML library with DOM?

2010-10-07 Thread Gregory Collins
Michael Snoyman mich...@snoyman.com writes:

 As far as I know, Neil Mitchel's tagsoup[1] parses according to the
 HTML 5 parsing rules, but it just generates a list of Tags[2], so
 you'd have to build the DOM tree up from there. I personally have had
 great experience with tagsoup. It's even the core of HTML-scraping
 technology powering searchonce[3].

Yep, someone else wrote me privately to say this (that tagsoup respects
the html5 lexing rules). So I'll be using this as the basis of an html5
DOM parser. Stay tuned!

G
-- 
Gregory Collins g...@gregorycollins.net
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] HTML library with DOM?

2010-10-07 Thread Michael Snoyman
2010/10/7 Gregory Collins g...@gregorycollins.net:
 Edward Z. Yang ezy...@mit.edu writes:

 Excerpts from Gregory Collins's message of Wed Oct 06 19:44:44 -0400 2010:
 I've got the month of October off, and one of the things I've been
 planning on working on is a compliant HTML5 parser for Haskell --
 something which is sorely needed! I will ping the list back if/when I
 get it finished.

 I've heard that some of the existing HTML parsers in Haskell were
 already HTML5 compliant (this topic came up when I was complaining
 that there were some algorithms that you absolutely had to have
 state for, because that was how they were specified.)  I never
 verified this assertion though.

 If there's already a library which *correctly* parses html5 documents
 into DOM trees, could someone please let me know so I can use it instead
 of wasting a bunch of time writing one?

As far as I know, Neil Mitchel's tagsoup[1] parses according to the
HTML 5 parsing rules, but it just generates a list of Tags[2], so
you'd have to build the DOM tree up from there. I personally have had
great experience with tagsoup. It's even the core of HTML-scraping
technology powering searchonce[3].

Michael

[1] http://hackage.haskell.org/package/tagsoup
[2] 
http://hackage.haskell.org/packages/archive/tagsoup/0.11.1/doc/html/Text-HTML-TagSoup.html#t:Tag
[3] http://www.search-once.com/
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] HTML library with DOM?

2010-10-07 Thread Neil Mitchell
Yes, I don't think I've officially announced a version of TagSoup that
has had HTML 5 parsing, but it now does as standard for the last few
releases. The HTML 5 spec is still changing, so it's entirely possible
something is incorrect in a corner case, but please let me know and
I'll fix it.

Thanks, Neil

2010/10/7 Gregory Collins g...@gregorycollins.net:
 Michael Snoyman mich...@snoyman.com writes:

 As far as I know, Neil Mitchel's tagsoup[1] parses according to the
 HTML 5 parsing rules, but it just generates a list of Tags[2], so
 you'd have to build the DOM tree up from there. I personally have had
 great experience with tagsoup. It's even the core of HTML-scraping
 technology powering searchonce[3].

 Yep, someone else wrote me privately to say this (that tagsoup respects
 the html5 lexing rules). So I'll be using this as the basis of an html5
 DOM parser. Stay tuned!

 G
 --
 Gregory Collins g...@gregorycollins.net
 ___
 Haskell-Cafe mailing list
 Haskell-Cafe@haskell.org
 http://www.haskell.org/mailman/listinfo/haskell-cafe

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] HTML library with DOM?

2010-10-06 Thread Günther Schmidt

Hi all,

is there an HTML parsing library that creates a DOM from a page?

Günther

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] HTML library with DOM?

2010-10-06 Thread Gregory Collins
Günther Schmidt gue.schm...@web.de writes:

 Hi all,

 is there an HTML parsing library that creates a DOM from a page?

I've got the month of October off, and one of the things I've been
planning on working on is a compliant HTML5 parser for Haskell --
something which is sorely needed! I will ping the list back if/when I
get it finished.

G
-- 
Gregory Collins g...@gregorycollins.net
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe