Hi All,

I was wondering if I could get some help here.  I am looking for an
existing function/method/module that will properly convert all special
characters (like those from Microsoft Word: smart quotes, mdash, ellipses,
bullet points, etc.) to either a matching simpler character, or an HTML
entity.

HTML::Entities does a close job, but it does not handle everything
correctly.

I need to clean this data up for use in a google product feed (xml).

Here is an example of some text I am having trouble with:
( the +'s are actually bullet points)
====== begin ======
My doctor has recommended a dream specialist, and together we are trying to
figure out what these nightmares mean. Jump into Hidden Object action in
Doors of the Mind – Inner Mysteries.ADVANTAGES OF THE COMPLETE VERSION
:DOORS OF THE MIND: INNER MYSTERIES + Dark atmosphere+ Spooky
gameplay+ Explore a world of nightmares!
======= end =======

And here is the output from using HTML::Entities:
====== begin ======
My doctor has recommended a dream specialist, and together we are trying to
figure out what these nightmares mean. Jump into Hidden Object action in
Doors of the Mind – Inner Mysteries.ADVANTAGES OF THE
COMPLETE VERSION :DOORS OF THE MIND: INNER
MYSTERIES + Dark atmosphere+ Spooky
gameplay+ Explore a world of nightmares!
======= end =======

Notice the extra  all over the place.

Any help you can provide would be immensely helpful.

Thanks.
--Alex

_______________________________________________
Boston-pm mailing list
[email protected]
http://mail.pm.org/mailman/listinfo/boston-pm

Reply via email to