Re: restrict processing to just text (content) of html document

Ben Dougall Mon, 12 Jun 2006 16:38:31 -0700


On Monday, June 12, 2006, at 11:58  pm, Jacob Haller wrote:

Really? I tried it before I responded. Starting with:

£3
I did Markup -> Utilities -> Translate text to HTML (HTML entitieschecked, use name, ignore < and >, encode Unicode characters,selection only. I selected £3 and the result is:
&pound;3

What exactly isn't working about this?
It does what you say, but it doesn't answer his question, which ismore general. (I don't think there is a general answer to hisquestion -- for instance I don't think you can selectively convertnon-tags to upper case, at least not without using a regularexpression -- but I may be wrong.)

yes that's right, thanks, it's the general case i was hoping for, notspecifically conversion to html entities.

don't think it'd be that hard to carry out either (in code); if you'vegot the capability to reliably strip out html from text (which i don'tthink is that hard -- is that something bbedit can do already? must be)all you've got to do further to that is instead of stripping it out,mask off the html, allow the text parts "showing through" to expand andshrink in length (and importantly remember their starts and ends -- sobasically treat all the little parts of text as separate files),perform process(es) on text parts, then put back the html parts -- whata great new feature of bbedit?! and not hard; i thought it was to startwith but it's not. you just have to treat all the text (having removedthe html temporally) as lots of little bits of text -- never join itall together which is essentially what the basic strip tags statementin php does. you could even have the option to include or not includethings like the xhtml text that are say class values -- which are kindof in between xhtml and english (or whatever human language you'rewriting in)

I did Markup -> Utilities -> Translate text to HTML (HTML entitieschecked, use name, ignore < and >, encode Unicode characters,selection only. I selected £3 and the result is:
&pound;3

will that work reliably for a whole html or xhtml document? if thedocument is correct html apart from it's lack of correctly encodedtext? i suppose it would right? as xhtml is entirely within <>'s (isn'tit?), so they'd be no danger of that process mangling perfectly okxhtml? i suppose the one problem would be when you've have <'s or >'sin the text that should be encoded but it's probably a bit much toexpect software to distinguish between <>'s that are in text and <>'sthat are part of html possibly.



--
------------------------------------------------------------------
Have a feature request? Not sure the software's working correctly?
If so, please send mail to <[EMAIL PROTECTED]>, not to the list.
List FAQ: <http://www.barebones.com/support/lists/bbedit_talk.shtml>
List archives: <http://www.listsearch.com/BBEditTalk.lasso>
To unsubscribe, send mail to:  <[EMAIL PROTECTED]>

Re: restrict processing to just text (content) of html document

Reply via email to