On Sat, 19 Jul 2008 17:23:48 -0700, Gary Kline <[EMAIL PROTECTED]> wrote: > Guys, > Is there an easyy way of splitting yp these tags into one-per-line? > > I'm not obcessive [[?, :)]], but for what I've got in mind, the tags > and stuff would look better to my eyes? ....the outcome of this will > go ino a special database, not html . > > is there some clever perl one-liner ...
I don't know about 'easy', because this looks pretty much like 'free form HTML'. Parsing liberally formatted HTML code from untrusted sources is a lot like trying to reinvent Firefox's HTML parsing engine or something similar. That's bound to be up there in the 'insanely difficult' and not so much in the 'easy to hack with sed and a bit of awk or some Perl' scale. If you have some sort of guarantee about the well-formedness of the HTML source though (i.e. it passes some sort of validation suite), then you can probably use tidy(1) to convert it to XML and then use xsltproc to convert the XML source to pretty much anything imaginable. Now, if you want to merely "hack something quick and dirty", a short Perl script can probably do regexp substitution similar to # # WARNING: THIS HAS NOT BEEN TESTED :P # my $foo = <STDIN>; $foo = s:(<[^>]+>[^<]*</[^>]+>):$1\n:ge; print "$foo"; but you shouldn't trust the output of such a quick hack too much. _______________________________________________ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to "[EMAIL PROTECTED]"