Yo~

Plain ASCII is, of course, just the first 128 characters from the ASCII table, so it doesn't surprise me that your accented characters got knocked down to unaccented characters most closely resembling the original -- È and Ë become E, Ü becomes U, etc. -- when you did a "Zap Gremlins" on your data. You can get the same effect if you perform the "Convert to ASCII" function from BBEdit's "Text" menu. (Actually, the Convert to ASCII has one advantage over Zap Gremlins in that *some* of the special characters will be converted to literal equivalents -- π will become pi, © will become (c), ∑ becomes Sum, ¥ will become Yen, etc.)

As for the ‰ ("per thousand") symbol, the closest ASCII equivalent would be what you got: 0/00, which most people would interpret as per thousand, thus retaining the meaning (if not the look) of the character you zapped. Many other special characters in the ASCII table also get a "literal" translation when they are converted (knocked down, really) from their special character status to plain ol' ASCII. For example, the 8Ω pair of characters in your sample data becomes 8Ohm when reduced to plain ASCII by BBEdit's "Convert to ASCII" method.

Perhaps my explanation doesn't help you prep your file to make it easier to handle your data with grep, but at least you can figure out most of the plain ASCII equivalents you will get when you look up the special characters greater than 128 in the ASCII table.

HTH!

~Semper Fi, Mac!

A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?
A: Top-posting.
Q: What is the most annoying thing on a mailing list?

= = = = = = = = = = = = = = = = = = = = = = = = = = = = = =

On Mar 10, 2008, at 5:27 PM, [EMAIL PROTECTED] wrote:

I have a file encoded as ISO-8859 (according to the file command at the command line). it is the ratings.file from imdb's database, and BBEdit says it's "Western (Mac OS Roman)"

I need the file to be plain ASCII so that I can do grep searches against it via a php script. Here is some sample data:

      0000000123  119567   8.6  LÈon (1994)
0000000124 120390 8.6 Fabuleux destin d'AmÈlie Poulain, Le (2001)
      0000000123   24627   8.5  RashÙmon (1950)
      0000000124   69931   8.4  Vita Ë bella, La (1997)
      0000000123   12564   8.3  Smultronst‰llet (1957)
      0000000114   17411   8.2  8Ω (1963)

I can Zap Gremlins to replace with the code

      0000000123  119567   8.6  L\0xC8on (1994)
0000000124 120390 8.6 Fabuleux destin d'Am\0xC8lie Poulain, Le (2001)

But that doesn't help me in doing a grep search through the file.

I also don't understand why "Smultronstället" shows up as "Smultronst‰llet" or why 'LÈon' appears instead of 'Léon', etc.

what I want is 'Leon', 'Fabuleux destine d'Amelie Poulain, Le', 'Rashomon', 'Vita e bella, La', and 'Smultronstallet' and '8 1/2'.

And it needs to be fairly quick and easy to fix because I need to update this file every month or two.

And if anyone knows what I am doing: yes, I did try to compile the moviedb-3.24 package under Leopard and failed badly.

--
We will fight for Bovine Freedom and hold our large heads high
We will run free with the Buffalo or die



--
------------------------------------------------------------------
Have a feature request? Not sure the software's working correctly?
If so, please send mail to <[EMAIL PROTECTED]>, not to the list.
List FAQ: <http://www.barebones.com/support/lists/bbedit_talk.shtml>
List archives: <http://www.listsearch.com/BBEditTalk.lasso>
To unsubscribe, send mail to:  <[EMAIL PROTECTED]>



--
------------------------------------------------------------------
Have a feature request? Not sure the software's working correctly?
If so, please send mail to <[EMAIL PROTECTED]>, not to the list.
List FAQ: <http://www.barebones.com/support/lists/bbedit_talk.shtml>
List archives: <http://www.listsearch.com/BBEditTalk.lasso>
To unsubscribe, send mail to:  <[EMAIL PROTECTED]>

Reply via email to