Hi Kurt --
If you want a case sensitive check, all you have to do is switch the
position of the property and value, i.e. use a list formatted like
["á":"á", "Á":"Á"]. In property lists, if the property is a
string, it case-sensitive, so html_elements.getaProp("Á") will
return "Á" as expected.
(Apologies if characters get messed up in this email; "á" should be
lowercase a-acute and "Á" should be uppercase A-acute).
As Jeff mentioned, a Javascript RegExp search would probably be the
simplest way to find and replace all the entities. If you're more
comfortable with Lingo, don't worry -- you can have one script set to
JavaScript, including just the replace_entities function below, and
call that function from Lingo:
// ---------------------
// -- put this script in a script castmember with language set to
JAVASCRIPT
//
function replace_entities(rawString, entity_list) { // where
entity_list is a sorted prop list like ["á":"á", "Á":"Á"]
processed_text = "";
arr = rawString.split(RegExp("(\&\#?[a-zA-Z0-9]+\;)", "")); // split
text into regular strings & possible entities
num_parts = arr.length;
if (arr != null) {
for (i=0; i<num_parts; i++) {
if (i % 2) { // is a possible entity
//insert special char from list; if not found, just leave the
string as is
entity_replacement = entity_list.getaProp(arr[i]);
(entity_replacement != undefined) ? (processed_text +=
entity_replacement) : (processed_text += arr[i]);
} else { // is not a possible entity, leave the string as is
processed_text += arr[i]
}
}
}
return processed_text;
}
// ---------------------
With the above function in a JavaScript castmember, you can then call
it from a Lingo script castmember (and even have your entity list set
up and sorted in Lingo):
-- ----------------------
-- put this script in a script castmember with language set to LINGO
--
on fix_html rawString
-- warning: VERY INCOMPLETE list of HTML entities!!!
entity_list = ["À": "À", "À": "À", "Á": "Á",
"Á": "Á", "Â": "Â", "Â": "Â", "Ã": "Ã",
"Ã": "Ã", "Ä": "Ä", "Ä": "Ä", "Å": "Å",
"Å": "Å", "Æ": "Æ", "Æ": "Æ", "Ç": "Ç",
"Ç": "Ç", "È": "È", "È": "È", "É": "É",
"É": "É", "Ê": "Ê", "Ê": "Ê", "Ë": "Ë",
"Ë": "Ë", "à", "à": "à", "á": "á", "á": "á",
"â": "â", "â": "â", "ã": "ã", "ã": "ã",
"ä": "ä", "ä": "ä", "å": "å", "å": "å", "æ":
"æ", "æ": "æ", "ç": "ç", "ç": "ç", "è": "è",
"è": "è", "é": "é", "é": "é", "ê": "ê",
"ê": "ê", "ë": "ë", "ë": "ë", "&": "&", "&":
"&", ">": ">", ">": ">", "<": "<", "<": "<", """:
QUOTE, """: QUOTE, "„": "„", "„": "„", "«": "«",
"«": "«", "“": "“", "“": "“", "‹": "‹",
"‹": "‹", "‘": "‘", "‘": "‘", "»": "»",
"»": "»", "”": "”", "”": "”", "›": "›",
"›": "›", "’": "’", "’": "’"]
entity_list.sort() -- sort list of entities for faster access
return replace_entities(rawString, entity_list) -- call the function
that's in the JavaScript script castmember
end
-- end script
-- ----------------------
Type put fix_html( rawString ) to see the results of replacing entities
in HTML text (you'll need to expand the list of entities above in order
to catch all of them). This function should be quite fast, and more
importantly, will have very little slowdown even if your list of HTML
entities is expanded to include the 1000+ entities out there.
most of the work of the JavaScript function is done by one line:
arr = rawString.split(RegExp("(\&\#?[a-zA-Z0-9]+\;)", ""));
That line divides a string into regular text and possible entities. For
instance, "bláh BLÁH" becomes the list ["bla",
"á", "h BL", "Á","H"], with every 2nd element being an
HTML entity. I think the RegExp pattern "\&\#?[a-zA-Z0-9]+\;" should
match any possible entity in the standard form of & or ' or
Ò etc.
It then just matches the possible entity to your master list using
getaProp(), which is case-sensitive AND very fast.
cheers,
jamie
[To remove yourself from this list, or to change to digest mode, go to
http://www.penworks.com/lingo-l.cgi To post messages to the list, email
[email protected] (Problems, email [EMAIL PROTECTED]). Lingo-L is for
learning and helping with programming Lingo. Thanks!]