RE: Miscellaneous web issues
Roozbeh, it is a long time and I don't remember your answer to this email. What happened to this new dll? AFAIK, it's not still put in the sourceforge. If you're interested, I can mail it to you off-list. - Ehsan Akhgari Farda Technology (http://www.farda-tech.com/) List Owner: [EMAIL PROTECTED] [ Email: [EMAIL PROTECTED] ] [ WWW: http://www.beginthread.com/Ehsan ] ___ PersianComputing mailing list [EMAIL PROTECTED] http://lists.sharif.edu/mailman/listinfo/persiancomputing
RE: Miscellaneous web issues
On Thu, 2004-06-03 at 21:08, Ehsan Akhgari wrote: I did this, and installed the new DLL on my system, and it works beatifully. It's the same keyboard layout, only Shift+Space inserts a ZWNJ instead of a space. I thought I would submit it to sourceforge so that everyone can use the new tool. Roozbeh, let me know if it would be okay for me to send the files to you to get them into the sourceforge, or if I should do something else. I would appreciate if you send me the exact process you used and the DLL, so we can publish it on the FarsiWeb website on SourceForge. roozbeh ___ PersianComputing mailing list [EMAIL PROTECTED] http://lists.sharif.edu/mailman/listinfo/persiancomputing
RE: Miscellaneous web issues
What is notepad? A text editor? Text editors should not insert a UTF-8 BOM either. The problem is that Microsoft sometimes invents non-standard things and then pushes it so hard that Unicode adds it to parts of the standard (or an FAQ). Microsoft conventions for .txt files in the Unicode FAQ looks sarcastic to me. Well, maybe you're right, but I don't see how a text editor is supposed to know the encoding of a file without some kind of mark. See, HTTP transfers the character set using the Content-Type response header. In HTML, it's spedified with a meta http-equiv=Content-Type ... tag. In XML, the default encoding is UTF-8, and if a document is encoded in another encoding, it must be specified in the ?xml ? PI. Plain text files have no means of identifying the character encoding, so a single text file can be interpreted as UTF-7, UTF-8, UTF-16, UTF-32, etc. if there's nothing to declare the exact character encoding used. The point here is that, protocols which do not allow BOM are those who provide other means of specifying the character encoding. A certain byte stream can have multiple interpretations depending on what content encoding you use to interpret it, and there must be some way to cut off this confusion. YMMV, - Ehsan Akhgari Farda Technology (http://www.farda-tech.com/) List Owner: [EMAIL PROTECTED] [ Email: [EMAIL PROTECTED] ] [ WWW: http://www.beginthread.com/Ehsan ] ___ PersianComputing mailing list [EMAIL PROTECTED] http://lists.sharif.edu/mailman/listinfo/persiancomputing
RE: Miscellaneous web issues
Thanks for the links. Seems like a very handy keyboard. BTW, why the Shift-Space combination does not work? Bug in Microsoft keyboard layout creation tool. Use Shift-B temporarily. Thanks. I've not done any work in this arena, so what I propose here might make no sense. Sorry if that's so. But, the M$ page on the keyboard layout creation tool says the tool simplifies the process of creating a keyboard layout. Would there be any way to assign ZWNJ to Shift+Space by coding the keyboard layout tool manually? If you can send me the C/C++ source file off-list, I'll try to investigate it further. If not, I guess Shift+B is not that bad as well. The keyboard layout rocks, even without having Shift+Space in place. :-) - Ehsan Akhgari Farda Technology (http://www.farda-tech.com/) List Owner: [EMAIL PROTECTED] [ Email: [EMAIL PROTECTED] ] [ WWW: http://www.beginthread.com/Ehsan ] ___ PersianComputing mailing list [EMAIL PROTECTED] http://lists.sharif.edu/mailman/listinfo/persiancomputing
RE: Miscellaneous web issues
On Tue, 2004-05-25 at 17:43, Ehsan Akhgari wrote: Well, maybe you're right, but I don't see how a text editor is supposed to know the encoding of a file without some kind of mark. Does Latin-1 (an old encoding of text files for Western Europe, also called ISO 8859-1) had a mark to distinguish it from, say, CP1256 (an old MS encoding for Arabic language)? Did ASCII have a mark? No. Text files are text files. They are not supposed to have marks to distinguish their character set. The character set of a text file should be in the metadata (file name, file system, environment variable, HTTP header, MIME header, ...) or it should be auto-detected (UTF-8 is really easy to detect, since it has a very regular mathematical pattern, UTF-16 is also easy to detect, since it's recommended that it has a BOM), or it should be specified by the user when he is opening a file. Plain text files have no means of identifying the character encoding, That is somehow true. Plain text files have *sometimes* no means of identifying the character encoding *by themselves*. so a single text file can be interpreted as UTF-7, UTF-8, UTF-16, UTF-32, etc. if there's nothing to declare the exact character encoding used. UTF-7 is deprecated. UTF-16 and UTF-32 *do* have BOM marks in the standards defining them, so it's OK if they use a BOM. UTF-8 doesn't have that. Nor does ASCII, CP1256, Latin-1, etc. The point here is that, protocols which do not allow BOM are those who provide other means of specifying the character encoding. The point is that Notepad doesn't add a mark to Latin-1 or CP1256, why should it add one to UTF-8?! A certain byte stream can have multiple interpretations depending on what content encoding you use to interpret it, and there must be some way to cut off this confusion. Yes, by either Metadata, auto-detection, or specific selection. roozbeh ___ PersianComputing mailing list [EMAIL PROTECTED] http://lists.sharif.edu/mailman/listinfo/persiancomputing
RE: Miscellaneous web issues
On Tue, 2004-05-18 at 23:13, Ehsan Akhgari wrote: and Notepad is not an HTML editor What is notepad? A text editor? Text editors should not insert a UTF-8 BOM either. The problem is that Microsoft sometimes invents non-standard things and then pushes it so hard that Unicode adds it to parts of the standard (or an FAQ). Microsoft conventions for .txt files in the Unicode FAQ looks sarcastic to me. roozbeh ___ PersianComputing mailing list [EMAIL PROTECTED] http://lists.sharif.edu/mailman/listinfo/persiancomputing
RE: Miscellaneous web issues
On Thu, 2004-05-20 at 01:48, C Bobroff wrote: Roozbeh, is it not time to remove the experimental from its name? No. This has not become a national standard yet. When it becomes a national standard (and possibly changing a little at the time), we'll remove experimental from the name. roozbeh ___ PersianComputing mailing list [EMAIL PROTECTED] http://lists.sharif.edu/mailman/listinfo/persiancomputing
RE: Miscellaneous web issues
On Thu, 2004-05-20 at 16:07, Ehsan Akhgari wrote: You can re-live its creation here in the archives: http://lists.sharif.edu/pipermail/persiancomputing/2003-June/0 00538.html [snip] Thanks for the links. Seems like a very handy keyboard. BTW, why the Shift-Space combination does not work? Bug in Microsoft keyboard layout creation tool. Use Shift-B temporarily. roozbeh ___ PersianComputing mailing list [EMAIL PROTECTED] http://lists.sharif.edu/mailman/listinfo/persiancomputing
RE: Miscellaneous web issues
You can re-live its creation here in the archives: http://lists.sharif.edu/pipermail/persiancomputing/2003-June/0 00538.html [snip] Thanks for the links. Seems like a very handy keyboard. BTW, why the Shift-Space combination does not work? Done! Beautiful! I hope the Mozilla users appreciate all this trouble. Thanks again for all your help! You're welcome! :-) - Ehsan Akhgari Farda Technology (http://www.farda-tech.com/) List Owner: [EMAIL PROTECTED] [ Email: [EMAIL PROTECTED] ] [ WWW: http://www.beginthread.com/Ehsan ] ___ PersianComputing mailing list [EMAIL PROTECTED] http://lists.sharif.edu/mailman/listinfo/persiancomputing
RE: Miscellaneous web issues
On Thu, 20 May 2004, Ehsan Akhgari wrote: BTW, why the Shift-Space combination does not work? Because the Microsoft Keyboard Layout Creator http://www.microsoft.com/globaldev/tools/msklc.mspx thought the space bar is reserved for only spacing characters. Roozbeh said he sent MS a list of such bugs. Until they fix that, shift-b is not bad for ZWNJ. -Connie ___ PersianComputing mailing list [EMAIL PROTECTED] http://lists.sharif.edu/mailman/listinfo/persiancomputing
RE: Miscellaneous web issues
On Wed, 19 May 2004, Ehsan Akhgari wrote: Interesting. Sorry for my ignorance, but is that keyboard available publicly? You can re-live its creation here in the archives: http://lists.sharif.edu/pipermail/persiancomputing/2003-June/000538.html And you can download it here: http://prdownloads.sourceforge.net/farsitools/persiankeyboard.zip?download A PDF file with the layout is here: http://lists.sharif.edu/pipermail/persiancomputing/attachments/20030612/2e85a1ad/PersianKL_preview.pdf I've also repeated the above here if you don't like ZIP files or have some other problem. http://students.washington.edu/irina/persianword/kb.htm Roozbeh, is it not time to remove the experimental from its name? Why not? The \u syntax allows you to represent Unicode characters in JavaScript. Now I know. Well, on Mozilla1.2.1 that I tested it on, if you replaces ZWNJ in the description of the Tajik array indices with #8204; then it seems to work happily. Try giving it a test. Done! Beautiful! I hope the Mozilla users appreciate all this trouble. Thanks again for all your help! -Connie ___ PersianComputing mailing list [EMAIL PROTECTED] http://lists.sharif.edu/mailman/listinfo/persiancomputing
RE: Miscellaneous web issues
An important note: what Notepad does here is only acceptable. It's not even recommended. HTML 4 clearly doesn't allow a UTF-8 BOM appear before the HTML tag. Notepad is supposed to be a text editor. A text editor shouldn't insert markup by itself. BTW, ISIRI 6219 strongly discourages the use of a BOM in UTF-8 files. The problem here is that web protocols (HTML for example) don't allow the BOM, and Notepad is not an HTML editor, so there's nothing to prevent it from adding the BOM. Check out: http://www.unicode.org/faq/utf_bom.html#28 - Ehsan Akhgari Farda Technology (http://www.farda-tech.com/) List Owner: [EMAIL PROTECTED] [ Email: [EMAIL PROTECTED] ] [ WWW: http://www.beginthread.com/Ehsan ] 'I generally take life as it comes my way', said Death. ___ PersianComputing mailing list [EMAIL PROTECTED] http://lists.sharif.edu/mailman/listinfo/persiancomputing
RE: Miscellaneous web issues
First of all, thank you very much for all the patient and lengthy explanations. Very nice of you to share so many tips! (Thanks to the others too who answered on and off list!) Happy to help! [snip] Now that 2 people have said to change ZWNJ to \u200c, I tried that but it didn't work. I don't think I have the right tool. I couldn't do it in Notepad because as I said, it's WYSIWYG in Persian script so if I do a global replacement and stick \u200c in the middle of Persian script, that's obviously not going to work (and I also tried it for good measure and it didn't work but there may be many reasons it didn't work out using Notepad.) I don't know what you mean here. Why it doesn't work in Notepad? Note that on Windows XP, you can't type ZWNJ inside the Find/Replace dialog box - you need to copy/paste it from inside the Notepad text editor window. Another reason why not to use Notepad. Then, since you recommended Frontpage, I tried that. Earlier, it had not even occured to me to attempt to open a .js file in Frontpage (version 2000.) This time I fooled it by changing the extension from .js to .html and so was able to open it in html view where all the unicode was in numeric style. I changed all the #8204; to \u200c but now I see that also has not worked. Well, I don't know what the problem is here... BTW, FrontPage 2003 can open the .js file (using File | Open, or drag and drop) and render the UTF-8 characters without converting them to numeric entities just fine. Don't try putting them in an HTML file. Don't know about FrontPage 2000, though. I think I'm not going to use Notepad for making bidirectional arrays from now on! That is insane to go to such great lengths! Yeah, it's definitely so. Not sure what you have in mind here, but at this point, Ill be glad just to make it work with ZWNJ. In the JS code, try to replace the trailing ZWNJ-raa and ZWNJ-o with nothing using a regex. HTH, - Ehsan Akhgari Farda Technology (http://www.farda-tech.com/) List Owner: [EMAIL PROTECTED] [ Email: [EMAIL PROTECTED] ] [ WWW: http://www.beginthread.com/Ehsan ] ___ PersianComputing mailing list [EMAIL PROTECTED] http://lists.sharif.edu/mailman/listinfo/persiancomputing
RE: Miscellaneous web issues
On Mon, 2004-05-17 at 17:44, Ehsan Akhgari wrote: Those are the BOM marks for UTF-8. Notepad injects them under your nose, and that's one of the reasons I avoid Notepad. Frontpage text editor does not have that problem. A small note: what Notepad does here is *correct*, because it can instruct other editors about the content encoding of the file. It just doesn't work with web documents, and that's expected, because Notepad has not been designed for creating web documents. An important note: what Notepad does here is only acceptable. It's not even recommended. HTML 4 clearly doesn't allow a UTF-8 BOM appear before the HTML tag. Notepad is supposed to be a text editor. A text editor shouldn't insert markup by itself. BTW, ISIRI 6219 strongly discourages the use of a BOM in UTF-8 files. roozbeh ___ PersianComputing mailing list [EMAIL PROTECTED] http://lists.sharif.edu/mailman/listinfo/persiancomputing
Re: Miscellaneous web issues
On 16-May-04, at 9:16 PM, C Bobroff wrote: 6. I embedded the fonts again. Looks beautiful on WInXP/IE6 and limited others. I presume it looks terrible on the rest. Still thinking about what to do about that. Behnam, how's the Tajik looking on your Mac? -Connie ___ Hi Connie, I almost missed your direct inquiry from me. I just noticed it in the reply of Ehsan Akhgari. Considering I wasn't sure what I was supposed to do when opening that page, I take it it's not working as it should. The mouse-over thing doesn't work. I have to select the word (double click) to see its equivalent in Tajik (or vice versa) but when I select the word, everything seems to work okay. The exception is the last word on Persian side. It can't find the word. The last word in Tajik side has no problem. I guess the major problem is that mouse-over trick doesn't work and selecting one by one is rather inconvenient. I was using Safari (browser) with Panther (OS 10.3.3) on iMac. I must add it's wonderful what you are doing there. Keep up the good work. Behnam ___ PersianComputing mailing list [EMAIL PROTECTED] http://lists.sharif.edu/mailman/listinfo/persiancomputing
Re: Miscellaneous web issues
On Sun, 16 May 2004, C Bobroff wrote: 2. When viewed on WinXP/Mozilla1.7a, the ZWNJ's completely throw off my mouseover javascript program. It can not find words with ZWNJ. And look what happens if you mouseover the Tajik eqivalent: it displays the Persian word ok but no ZWNJ. This problem not seen with IE. I left out all harakat just so it would work in Mozilla (and Macs) so I'm sorry to see this new problem. I've observed a very similar bug that should be the same as what you explain: ZWNJ put by JavaScript in UTF-8 format in the page is completely thrown away. As a solution, if you replace all ZWNJs with \u200C in your JavaScript source, it works. [BTW, your Herat#1 and Herat#2 MP3 files seem silent to my player.] --behdad behdad.org ___ PersianComputing mailing list [EMAIL PROTECTED] http://lists.sharif.edu/mailman/listinfo/persiancomputing
Re: Miscellaneous web issues
On Sun, 16 May 2004, C Bobroff wrote: 1. When viewed on WinXP/IE6, look what happens when you mouseover the Persian words at the end (i.e. left margin) of each line. You also pick up the space to the right of the first word in that line. Similarly, if you attempt to mouseover the first word in the line and are just a little off the word to the right, you unfortunately will pick up the last word in the line. Is this a bug or just my usual crazy coding style? This problem not seen with Mozilla. Also not with left to right languages. Remove all leading and trailing spaces in your spans and it should work. BTW, RTL paragraphs are a must. --behdad behdad.org ___ PersianComputing mailing list [EMAIL PROTECTED] http://lists.sharif.edu/mailman/listinfo/persiancomputing