Yo~
Plain ASCII is, of course, just the first 128 characters from the
ASCII table, so it doesn't surprise me that your accented characters
got knocked down to unaccented characters most closely resembling the
original -- È and Ë become E, Ü becomes U, etc. -- when you did a
"Zap Gremlins" on your data. You can get the same effect if you
perform the "Convert to ASCII" function from BBEdit's "Text" menu.
(Actually, the Convert to ASCII has one advantage over Zap Gremlins
in that *some* of the special characters will be converted to literal
equivalents -- π will become pi, © will become (c), ∑ becomes Sum, ¥
will become Yen, etc.)
As for the ‰ ("per thousand") symbol, the closest ASCII equivalent
would be what you got: 0/00, which most people would interpret as per
thousand, thus retaining the meaning (if not the look) of the
character you zapped. Many other special characters in the ASCII
table also get a "literal" translation when they are converted
(knocked down, really) from their special character status to plain
ol' ASCII. For example, the 8Ω pair of characters in your sample data
becomes 8Ohm when reduced to plain ASCII by BBEdit's "Convert to
ASCII" method.
Perhaps my explanation doesn't help you prep your file to make it
easier to handle your data with grep, but at least you can figure out
most of the plain ASCII equivalents you will get when you look up the
special characters greater than 128 in the ASCII table.
HTH!
~Semper Fi, Mac!
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?
A: Top-posting.
Q: What is the most annoying thing on a mailing list?
= = = = = = = = = = = = = = = = = = = = = = = = = = = = = =
On Mar 10, 2008, at 5:27 PM, [EMAIL PROTECTED] wrote:
I have a file encoded as ISO-8859 (according to the file command at
the command line). it is the ratings.file from imdb's database,
and BBEdit says it's "Western (Mac OS Roman)"
I need the file to be plain ASCII so that I can do grep searches
against it via a php script. Here is some sample data:
0000000123 119567 8.6 LÈon (1994)
0000000124 120390 8.6 Fabuleux destin d'AmÈlie Poulain,
Le (2001)
0000000123 24627 8.5 RashÙmon (1950)
0000000124 69931 8.4 Vita Ë bella, La (1997)
0000000123 12564 8.3 Smultronst‰llet (1957)
0000000114 17411 8.2 8Ω (1963)
I can Zap Gremlins to replace with the code
0000000123 119567 8.6 L\0xC8on (1994)
0000000124 120390 8.6 Fabuleux destin d'Am\0xC8lie
Poulain, Le (2001)
But that doesn't help me in doing a grep search through the file.
I also don't understand why "Smultronstället" shows up as
"Smultronst‰llet" or why 'LÈon' appears instead of 'Léon', etc.
what I want is 'Leon', 'Fabuleux destine d'Amelie Poulain, Le',
'Rashomon', 'Vita e bella, La', and 'Smultronstallet' and '8 1/2'.
And it needs to be fairly quick and easy to fix because I need to
update this file every month or two.
And if anyone knows what I am doing: yes, I did try to compile the
moviedb-3.24 package under Leopard and failed badly.
--
We will fight for Bovine Freedom and hold our large heads high
We will run free with the Buffalo or die
--
------------------------------------------------------------------
Have a feature request? Not sure the software's working correctly?
If so, please send mail to <[EMAIL PROTECTED]>, not to the list.
List FAQ: <http://www.barebones.com/support/lists/bbedit_talk.shtml>
List archives: <http://www.listsearch.com/BBEditTalk.lasso>
To unsubscribe, send mail to: <[EMAIL PROTECTED]>
--
------------------------------------------------------------------
Have a feature request? Not sure the software's working correctly?
If so, please send mail to <[EMAIL PROTECTED]>, not to the list.
List FAQ: <http://www.barebones.com/support/lists/bbedit_talk.shtml>
List archives: <http://www.listsearch.com/BBEditTalk.lasso>
To unsubscribe, send mail to: <[EMAIL PROTECTED]>