Ralf Hautkappe wrote:
> 
> hi,
> 
> i want to extract links and their tags out of html with gforth... i have go=
> t=20
> one solution with search ( a1 n1 s| href=3D"| search...) .. but i feel its =
> to=20
> complex., because i parse large files with different levels of links....

What are different levels of links?

> is=
> =20
> there an other way?

You could use a general string matcher like FoSM by Gordon Charlton
(later maintained by Chris Jakeman), or a general parser/parser
generator like BNFparse by Brad Rodriguez or Gray by me.

Or you could use a general SGML/HTML/XML parser with an appropriate
DTD, but I don't know one written in Forth, and real-world web
documents don't conform to DTDs anyway (I don't know how the usual
parsers deal with that).

For your problem, I would probably stick with SEARCH, maybe with a
little SCAN, SKIP, and their backwards equivalents.  I would not work
a line-at-a-time, but a file-at-a-time, because links can cross line
boundaries.

> maybe using forth=B4s interpreter?

I don't think the Forth interrpeter can be used profitably without
major surgery.

- anton

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to