Thanks for the responses. The URL is behind a login screen so there is no way for me to share it directly. I am pretty sure that the problem is with the page encoding, however, as you've both suggested. FireFox's View>Character Encoding just gives other kinds of garbled text. But I have figured out why, I think. When I view source, the garbled Cyrillic is really all encoded entities like this (mixed with Latin accented characters): ńęîă in a page that is rendering as charset=iso-8859-1.
So maybe that is a starting point for me, but I'm guessing this isn't really a BBEdit topic anymore, so I'll just proceed from here unless there are any further suggestions. Thanks. -- Lloyd Dunn http://nula.cc/ http://blog.nula.cc/ On Mar 10, 9:16 pm, "Robert A. Rosenberg" <[email protected]> wrote: > At 10:00 AM +0100 on 03/10/2011, Lloyd Dunn wrote about transliterate > into cyrillic: > > >Below are a few examples of garbled Cyrilic from a web page (this > >happens to be a CD track list). > > >Is there a simple direct way to transliterate or re-encode these into > >proper Cyrillic characters using BBEdit? I've tried all the charsets > >in the 'Reopen using encoding' submenu, but to no avail. > > >I've done this (usually imperfectly) in the past using online > >converters and hacky freeware, but I'd really like to accomplish this > >task within BBEdit. > > >Any insights welcome. > > >001. �橢���펑 �뎩���� (������) - > >����� � �����t� ������ > >002. � ��矴�a�� Ď����� �-��� - > >���� �t����� ��ᩢ�ގ�� � A����� > >003. � ��矴�a�� Ď����� �-��� - > >���� a��a��� ��� ���玴���� > >�������� > >004. � ��矴�a�� Ď����� �-��� - > >ˎ玩��� ����� ������ > >005. � ��矴�a�� Ď����� �-��� - > >���� ��� ������� ������ > > >-- > >Lloyd Dunn > >http://nula.cc/ > >http://blog.nula.cc/ > > >-- > >You received this message because you are subscribed to the > >"BBEdit Talk" discussion group on Google Groups. > >To post to this group, send email to [email protected] > >To unsubscribe from this group, send email to > >[email protected] > >For more options, visit this group at > ><http://groups.google.com/group/bbedit?hl=en> > >If you have a feature request or would like to report a problem, > >please email "[email protected]" rather than posting to the group. > >Follow @bbedit on Twitter: <http://www.twitter.com/bbedit> > > This looks like the page is declared as ISO-8859-1 in the meta tag > instead of utf-8. User the source/page view option to check. Try > telling your browser to display it as Character Set UTF-8. What is > the URL? I can look at it for you if you want. When you see the > characters in groups of 3 (and they are all accented) that is a tip > off for utf-8. If you look up the ISO-8859-1 codepoint of the > characters in (for example) � you can see how it converts to the > UTF-8 coding and see if it is the Cyrillic Unicode range. The only > problem with this is that Cyrillic is 2 byte not 3 byte UTF-8 > encoding. -- You received this message because you are subscribed to the "BBEdit Talk" discussion group on Google Groups. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at <http://groups.google.com/group/bbedit?hl=en> If you have a feature request or would like to report a problem, please email "[email protected]" rather than posting to the group. Follow @bbedit on Twitter: <http://www.twitter.com/bbedit>
