Re: Case sensitive check

Jamie Ciocco Sun, 19 Feb 2006 13:08:18 -0800

Hi Kurt --

If you want a case sensitive check, all you have to do is switch theposition of the property and value, i.e. use a list formatted like["&aacute":"á", "&Aacute":"Á"]. In property lists, if the property is astring, it case-sensitive, so html_elements.getaProp("Á") willreturn "Á" as expected.

(Apologies if characters get messed up in this email; "á" should belowercase a-acute and "Á" should be uppercase A-acute).

As Jeff mentioned, a Javascript RegExp search would probably be thesimplest way to find and replace all the entities. If you're morecomfortable with Lingo, don't worry -- you can have one script set toJavaScript, including just the replace_entities function below, andcall that function from Lingo:


// ---------------------

// -- put this script in a script castmember with language set toJAVASCRIPT

//

function replace_entities(rawString, entity_list) { // whereentity_list is a sorted prop list like ["&aacute":"á", "&Aacute":"Á"]

  processed_text = "";

arr = rawString.split(RegExp("(\&\#?[a-zA-Z0-9]+\;)", "")); // splittext into regular strings & possible entities


  num_parts = arr.length;
  if (arr != null) {
    for (i=0; i<num_parts; i++) {
      if (i % 2) {  // is a possible entity

//insert special char from list; if not found, just leave thestring as is

        entity_replacement = entity_list.getaProp(arr[i]);

(entity_replacement != undefined) ? (processed_text +=entity_replacement) : (processed_text += arr[i]);

      } else {  // is not a possible entity, leave the string as is
        processed_text += arr[i]
      }
    }
  }
  return processed_text;
}
// ---------------------

With the above function in a JavaScript castmember, you can then callit from a Lingo script castmember (and even have your entity list setup and sorted in Lingo):


-- ----------------------
-- put this script in a script castmember with language set to LINGO
--
on fix_html rawString
-- warning: VERY INCOMPLETE list of HTML entities!!!

entity_list = ["À": "À", "À": "À", "Á": "Á","Á": "Á", "Â": "Â", "Â": "Â", "Ã": "Ã","Ã": "Ã", "Ä": "Ä", "Ä": "Ä", "Å": "Å","Å": "Å", "Æ": "Æ", "Æ": "Æ", "Ç": "Ç","Ç": "Ç", "È": "È", "È": "È", "É": "É","É": "É", "Ê": "Ê", "Ê": "Ê", "Ë": "Ë","Ë": "Ë", "à", "à": "à", "á": "á", "á": "á","â": "â", "â": "â", "ã": "ã", "ã": "ã","ä": "ä", "ä": "ä", "å": "å", "å": "å", "æ":"æ", "æ": "æ", "ç": "ç", "ç": "ç", "è": "è","è": "è", "é": "é", "é": "é", "ê": "ê","ê": "ê", "ë": "ë", "ë": "ë", "&": "&", "&":"&", ">": ">", ">": ">", "<": "<", "<": "<", """:QUOTE, """: QUOTE, "&bdquo;": "„", "„": "„", "«": "«","«": "«", "“": "“", "“": "“", "&lsaquo;": "‹","‹": "‹", "‘": "‘", "‘": "‘", "»": "»","»": "»", "”": "”", "”": "”", "&rsaquo;": "›","›": "›", "’": "’", "’": "’"]


  entity_list.sort()  -- sort list of entities for faster access

return replace_entities(rawString, entity_list) -- call the functionthat's in the JavaScript script castmember

end

-- end script
-- ----------------------

Type put fix_html( rawString ) to see the results of replacing entitiesin HTML text (you'll need to expand the list of entities above in orderto catch all of them). This function should be quite fast, and moreimportantly, will have very little slowdown even if your list of HTMLentities is expanded to include the 1000+ entities out there.


most of the work of the JavaScript function is done by one line:
  arr = rawString.split(RegExp("(\&\#?[a-zA-Z0-9]+\;)", ""));

That line divides a string into regular text and possible entities. Forinstance, "bláh BLÁH" becomes the list ["bla","á", "h BL", "Á","H"], with every 2nd element being anHTML entity. I think the RegExp pattern "\&\#?[a-zA-Z0-9]+\;" shouldmatch any possible entity in the standard form of & or ' orÒ etc.

It then just matches the possible entity to your master list usinggetaProp(), which is case-sensitive AND very fast.


cheers,
jamie



[To remove yourself from this list, or to change to digest mode, go to 
http://www.penworks.com/lingo-l.cgi  To post messages to the list, email 
[email protected]  (Problems, email [EMAIL PROTECTED]). Lingo-L is for 
learning and helping with programming Lingo.  Thanks!]

Re: Case sensitive check

Reply via email to