Hi Kurt --

If you want a case sensitive check, all you have to do is switch the position of the property and value, i.e. use a list formatted like ["&aacute":"á", "&Aacute":"Á"]. In property lists, if the property is a string, it case-sensitive, so html_elements.getaProp("Á") will return "Á" as expected.

(Apologies if characters get messed up in this email; "á" should be lowercase a-acute and "Á" should be uppercase A-acute).

As Jeff mentioned, a Javascript RegExp search would probably be the simplest way to find and replace all the entities. If you're more comfortable with Lingo, don't worry -- you can have one script set to JavaScript, including just the replace_entities function below, and call that function from Lingo:

// ---------------------
// -- put this script in a script castmember with language set to JAVASCRIPT
//
function replace_entities(rawString, entity_list) { // where entity_list is a sorted prop list like ["&aacute":"á", "&Aacute":"Á"]
  processed_text = "";

arr = rawString.split(RegExp("(\&\#?[a-zA-Z0-9]+\;)", "")); // split text into regular strings & possible entities

  num_parts = arr.length;
  if (arr != null) {
    for (i=0; i<num_parts; i++) {
      if (i % 2) {  // is a possible entity
//insert special char from list; if not found, just leave the string as is
        entity_replacement = entity_list.getaProp(arr[i]);
(entity_replacement != undefined) ? (processed_text += entity_replacement) : (processed_text += arr[i]);
      } else {  // is not a possible entity, leave the string as is
        processed_text += arr[i]
      }
    }
  }
  return processed_text;
}
// ---------------------


With the above function in a JavaScript castmember, you can then call it from a Lingo script castmember (and even have your entity list set up and sorted in Lingo):

-- ----------------------
-- put this script in a script castmember with language set to LINGO
--
on fix_html rawString
-- warning: VERY INCOMPLETE list of HTML entities!!!
entity_list = ["&#192;": "À", "&Agrave;": "À", "&#193;": "Á", "&Aacute;": "Á", "&#194;": "Â", "&Acirc;": "Â", "&#195;": "Ã", "&Atilde;": "Ã", "&#196;": "Ä", "&Auml;": "Ä", "&#197;": "Å", "&Aring;": "Å", "&#198;": "Æ", "&AElig;": "Æ", "&#199;": "Ç", "&Ccedil;": "Ç", "&#200;": "È", "&Egrave;": "È", "&#201;": "É", "&Eacute;": "É", "&#202;": "Ê", "&Ecirc;": "Ê", "&#203;": "Ë", "&Euml;": "Ë", "à", "&agrave;": "à", "&#225;": "á", "&aacute;": "á", "&#226;": "â", "&acirc;": "â", "&#227;": "ã", "&atilde;": "ã", "&#228;": "ä", "&auml;": "ä", "&#229;": "å", "&aring;": "å", "&#230;": "æ", "&aelig;": "æ", "&#231;": "ç", "&ccedil;": "ç", "&#232;": "è", "&egrave;": "è", "&#233;": "é", "&eacute;": "é", "&#234;": "ê", "&ecirc;": "ê", "&#235;": "ë", "&euml;": "ë", "&amp;": "&", "&#38;": "&", "&gt;": ">", "&#62;": ">", "&lt;": "<", "&#60;": "<", "&quot;": QUOTE, "&#34;": QUOTE, "&bdquo;": "„", "&#8222;": "„", "&laquo;": "«", "&#171;": "«", "&ldquo;": "“", "&#8220;": "“", "&lsaquo;": "‹", "&#8249;": "‹", "&lsquo;": "‘", "&#8216;": "‘", "&raquo;": "»", "&#187;": "»", "&rdquo;": "”", "&#8221;": "”", "&rsaquo;": "›", "&#8250;": "›", "&rsquo;": "’", "&#8217;": "’"]

  entity_list.sort()  -- sort list of entities for faster access

return replace_entities(rawString, entity_list) -- call the function that's in the JavaScript script castmember
end

-- end script
-- ----------------------

Type put fix_html( rawString ) to see the results of replacing entities in HTML text (you'll need to expand the list of entities above in order to catch all of them). This function should be quite fast, and more importantly, will have very little slowdown even if your list of HTML entities is expanded to include the 1000+ entities out there.

most of the work of the JavaScript function is done by one line:
  arr = rawString.split(RegExp("(\&\#?[a-zA-Z0-9]+\;)", ""));

That line divides a string into regular text and possible entities. For instance, "bl&aacute;h BL&Aacute;H" becomes the list ["bla", "&aacute;", "h BL", "&Aacute;","H"], with every 2nd element being an HTML entity. I think the RegExp pattern "\&\#?[a-zA-Z0-9]+\;" should match any possible entity in the standard form of &amp; or &#39; or &Ograve; etc.

It then just matches the possible entity to your master list using getaProp(), which is case-sensitive AND very fast.

cheers,
jamie



[To remove yourself from this list, or to change to digest mode, go to 
http://www.penworks.com/lingo-l.cgi  To post messages to the list, email 
[email protected]  (Problems, email [EMAIL PROTECTED]). Lingo-L is for 
learning and helping with programming Lingo.  Thanks!]

Reply via email to