I wish it was that easy. A regex was my first thought but it only
works for my simple example.

You are correct, I should have provided a more detailed example.

Basically, the method should remove all extraneous HTML from a string.
The resulting HTML string is to be displayed in a div, so trailing
newlines, br tags, whitespace etc. need to be stripped. If the
complete string is inside a p tag then that p tag should be removed.
This extra stuff is added by pretty much any HTML edit control we can
get our hands on, free or commercial.

So in a nutshell:

input:
"<p><br> <br /></p>" & vbTab & vbCrLf & " <p><p>this is a test</p>" &
vbCrLf & " <p>It really is</p> </p> <p><br><br /></p> " & vbTab &
vbCrLf

output:
"<p>this is a test</p>" & vbCrLf & " <p>It really is</p>"

logic:
The first p block should be removed because it contains no real
textual information.
Leaving vbTab & vbCrLf & " <p><p>this is a test</p>" & vbCrLf & "
<p>It really is</p> </p> <p><br><br /></p> " & vbTab & vbCrLf

The tabs and linefeeds should be removed because they have no real
meaning in HTML.
Leaving " <p><p>this is a test</p>" & vbCrLf & " <p>It really is</p> </
p> <p><br><br /></p> "

Again the whitespace should be trimmed:
Leaving "<p><p>this is a test</p>" & vbCrLf & " <p>It really is</p> </
p> <p><br><br /></p>"

The last p block should be removed because it contains no real textual
information.
Leaving "<p><p>this is a test</p>" & vbCrLf & " <p>It really is</p> </
p> "

Again the whitespace should be trimmed:
Leaving "<p><p>this is a test</p>" & vbCrLf & " <p>It really is</p> </
p>"

Having the whole HTML block inside of p tags in a div is useless so
these extra p tags should be removed:
Leaving "<p>this is a test</p>" & vbCrLf & " <p>It really is</p> "

Finally the whitespace should be trimmed:
Leaving "<p>this is a test</p>" & vbCrLf & " <p>It really is</p>"

I know this must be possible and there is surely an easy way to do it
but nothing seems to work. It is a much more complex problem than it
first appears :o(
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"DotNetDevelopment, VB.NET, C# .NET, ADO.NET, ASP.NET, XML, XML Web 
Services,.NET Remoting" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at
http://groups.google.com/group/DotNetDevelopment

You may subscribe to group Feeds using a RSS Feed Reader to stay upto date 
using following url  

<a href="http://feeds.feedburner.com/DotNetDevelopment";> 
http://feeds.feedburner.com/DotNetDevelopment</a>
-~----------~----~----~----~------~----~------~--~---

Reply via email to