Re: Hebrew filenames from a Windows(XP) zip file

2005-10-31 Thread Yaad Blum

--0-542618156-1130696841=:39025
Content-Type: text/plain; charset=iso-8859-1
Content-Transfer-Encoding: 8bit
Content-Id: 
Content-Disposition: inline

Use the rep-heb-zip script to convert file names to a
proper hebrew format. The script recursively change
all filenames of a given directory.

Also, zip2gz script , which uses the previous, changes
all zip archives - starting from a given directory, to
tar.tgz 

YB




__ 
Yahoo! Mail - PC Magazine Editors' Choice 2005 
http://mail.yahoo.com
--0-542618156-1130696841=:39025
Content-Type: application/octet-stream; name=rep-heb-zip
Content-Transfer-Encoding: base64
Content-Description: 1852776729-rep-heb-zip
Content-Disposition: attachment; filename=rep-heb-zip

IyEgL2Jpbi9jc2ggLWYKIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMj
IyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIwojIyMjIyBUSElTIFND
UklQVCBDT1JSRUNUUyBUSEUgUFJPQkxFTSBXSEVOIFVTSU5HIFVOWklQIE9G
IAojIyMjIyBIRUJSRVcgRklMRSBOQU1FUyBBUkNISVZFRCBXSVRIIFdJTlpJ
UAojIyMjIyBOb3RlOiBhbGwgZmlsZSBuYW1lIHdpdGggISBjaGFyIHdpbGwg
YmUgY2hhbmdlZCB0byBfCiMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMj
IyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMKCiMjIENoZWNrIEFy
Z3VtZW50cwppZiAoJCNhcmd2ICE9IDEpIHRoZW4KCWVjaG8gJ1VzYWdlOiAk
MCBkaXJlY3RvcnkgJwplbHNlCgkjIyBTYXZlIGNvbW1hbmQgbGluZSBhcmdz
IGluIHZhcmlhYmxlcwoJc2V0IGRpciA9ICIkMSIKCgkjUmVtb3ZlIGxhc3Qg
LyBpZiBleGlzdAoJaWYgKCIkZGlyIiA9fiAqLykgdGhlbgoJCXNldCBkaXIg
PSBgZWNobyAiJGRpciIgfCBhd2sgJ3twcmludCBzdWJzdHIoJDAsMSxsZW5n
dGgoJDApLTEpfSdgIAkKCWVuZGlmCgoJI0NoZWNrIGRpcmVjdG9yeSBpcyBu
b3QgZW1wdHkKCXNldCB4ID0gYGxzICIkZGlyIiB8IHdjIC1sYAoJaWYgKCR4
ID1+IDApIHRoZW4KCQlleGl0CgllbmRpZgoKCgkjZ2V0IGlub2RlIG9mIGRp
cmVjdG9yeQoJc2V0IGlub2RlbiA9IGBscyAtaWQgIiRkaXIiIHwgYXdrICd7
cHJpbnQgJDF9J2AKCgkjI2NvbnZlcnQgZmlsZXMgdG8gaGVicmV3CglscyAt
LWZvcm1hdD1zaW5nbGUtY29sdW1uICIkZGlyIiB8IGljb252IC1mIGNwODYy
IC10IHV0ZjggfGNhdCAtYiA+ICJ0ZW1waGVibGlzdCIkaW5vZGVuCgoJI3Bh
c3Mgb24gYWxsIGZpbGVzIGFuZCByZW5hbWUgdG8gaGVicmV3CglAIGNvdW50
ID0gMAoJZm9yZWFjaCBmaWxlICgiJGRpciIvKikKCQlAIGNvdW50ID0gJGNv
dW50ICsgMQoJCQoJCSNnZXQgdGhlIGNvdW50IGhlYnJldyBmaWxlIG5hbWUK
CQlhd2sgJyQxPT0nJGNvdW50JyB7Zm9yKGk9MjtpPD1ORjtpKyspIHByaW50
ICRpfScgInRlbXBoZWJsaXN0IiRpbm9kZW4gID4gInRlbXAxIgoJCWNhdCB0
ZW1wMSB8IHRyICJcbiIgIiAiID4gdGVtcDIKCQlzZXQgaGVicmV3X2ZpbGVf
bmFtZSA9IGBjYXQgdGVtcDJgCgkJcm0gLWYgdGVtcDEKCQlybSAtZiB0ZW1w
MgoKCQkjY2hhbmdlICEgc2lnbnMgdG8gXyB0byBlc2NhcGUgdGhpcyBjaGFy
YWN0ZXIKCQlzZXQgaGVicmV3X2ZpbGVfbmFtZSA9IGBlY2hvICRoZWJyZXdf
ZmlsZV9uYW1lIHwgdHIgISBfYAoKCQkjcmVwbGFjZSBmaWxlIHdpdGggaGVi
cmV3IGZpbGUgbmFtZS4gTG9uZyBjb2RlIHNvIGl0IHdpbGwgd29yayBmb3Ig
ZW5nIG5hbWVzIHRvbwoJCWNwIC1SICIkZmlsZSIgIiRkaXIvdF8kaGVicmV3
X2ZpbGVfbmFtZSIKCQlybSAtUmYgIiRmaWxlIgoJCWNwIC1SICIkZGlyL3Rf
JGhlYnJld19maWxlX25hbWUiICIkZGlyLyRoZWJyZXdfZmlsZV9uYW1lIgoJ
CXJtIC1SZiAiJGRpci90XyRoZWJyZXdfZmlsZV9uYW1lIgoJCQoJCSNpZiBm
aWxlIGlzIGEgZGlyZWN0b3J5IGNhbGwgcmVwbGFjZSBoZWJlcmV3IHJlY3Vy
c2Vpdmx5CgkJaWYoLWQgIiRkaXIvJGhlYnJld19maWxlX25hbWUiKSB0aGVu
CgkJCS4vcmVwLWhlYi16aXAgIiRkaXIvJGhlYnJld19maWxlX25hbWUiCgkJ
ZW5kaWYKCgllbmQKCgkjcmVtb3ZlIHRlbXAgZmlsZSBmb3IgbGlzdCBvZiBo
ZWJyZXcgZmlsZSBuYW1lcwoJcm0gLWYgInRlbXBoZWJsaXN0IiRpbm9kZW4K
CmVuZGlmCg==

--0-542618156-1130696841=:39025
Content-Type: application/octet-stream; name=zip2gz
Content-Transfer-Encoding: base64
Content-Description: 2280821631-zip2gz
Content-Disposition: attachment; filename=zip2gz

IyEgL2Jpbi9jc2ggLWYKIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMj
IyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIwojIyMjIyBUSElTIFND
UklQVCBDSEFOR0VTIEFMTCBaSVAgRklMRVMgVE8gVEFSLkdaIEZJTEVTCiMj
IyMjIE5vdGU6IHJlcC1oZWItemlwIG11c3QgYmUgaW4gdGhlIGNhbGxpbmcg
ZGlyZWNvcnkKIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMj
IyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIwoKIyMgQ2hlY2sgQXJndW1lbnRz
CmlmICgkI2FyZ3YgIT0gMSkgdGhlbgoJZWNobyAnVXNhZ2U6ICQwIGRpcmVj
dG9yeSAnCmVsc2UKCSMjIFNhdmUgY29tbWFuZCBsaW5lIGFyZ3MgaW4gdmFy
aWFibGVzCglzZXQgZGlyID0gIiQxIgoKCSNSZW1vdmUgbGFzdCAvIGlmIGV4
aXN0CglpZiAoIiRkaXIiID1+ICovKSB0aGVuCgkJc2V0IGRpciA9IGBlY2hv
ICIkZGlyIiB8IGF3ayAne3ByaW50IHN1YnN0cigkMCwxLGxlbmd0aCgkMCkt
MSl9J2AgCQoJZW5kaWYKCgkjQ2hlY2sgZGlyZWN0b3J5IGlzIG5vdCBlbXB0
eQoJc2V0IHggPSBgbHMgIiRkaXIiIHwgd2MgLWxgCglpZiAoJHggPX4gMCkg
dGhlbgoJCWV4aXQKCWVuZGlmCgoJI3Bhc3Mgb24gYWxsIGZpbGVzIGFuZCB0
cnkgdG8gY29udmVydAoJZm9yZWFjaCBmaWxlICgiJGRpciIvKikKCgkJI1Ro
aXMgaXMgYSB6aXAgZmlsZQoJCWlmICgtZiAiJGZpbGUiICYmICIkZmlsZSIg
PX4gKi5belpdW0lpXVtQcF0pIHRoZW4KCQkJCgkJCSNGaWxlIG5hbWUgd2l0
aCBvdXQgemlwCgkJCXNldCBmaWxlX25vemlwID0gYGVjaG8gIiRmaWxlIiB8
IGF3ayAne3ByaW50IHN1YnN0cigkMCwxLGxlbmd0aCgkMCktNCl9J2AgCgoJ
CQkjRXh0cmFjdCBhbmQgZGVsZXRlIHppcAoJCQl1bnppcCAiJGZpbGUiIC1k
ICIkZmlsZV9ub3ppcCIKCQkJcm0gLWYgIiRmaWxlIgoJCQkKCQkJI0ZpeCBo
ZWJyZXcgZmlsZW5hbWVzCgkJCSAuL3JlcC1oZWItemlwICIkZmlsZV9ub3pp
cCIKCgkJCSNDYWxsIHJlY3Vyc2V2bHkKCQkJLi96aXAyZ3ogIiRmaWxlX25v
emlwIgoKCQkJI3NhdmUgY3VycmVudCBkaXJlY29yeQoJCQlzZXQgbG9jYWxf
ZGlyID0gYHB3ZGAKCQkKCQkJI1NhdmUgYWJzb3VsdGUgZmlsZSBuYW1lCgkJ
CXNldCBhYnNfZmlsZSA9IGBlY2hvICIkZmlsZV9ub3ppcCIgfCBhd2sgLUYg

Re: Hebrew filenames from a Windows(XP) zip file

2005-10-31 Thread Yaad Blum

--0-1737671725-1130688011=:59061
Content-Type: text/plain; charset=iso-8859-1
Content-Transfer-Encoding: 8bit
Content-Id: 
Content-Disposition: inline

Use the rep-heb-zip script to convert file names to a
proper hebrew format. The script recursively change
all filenames of a given directory.

Also, zip2gz script , which uses the previous, changes
all zip archives - starting from a given directory, to
tar.tgz 

YB




__ 
Yahoo! Mail - PC Magazine Editors' Choice 2005 
http://mail.yahoo.com
--0-1737671725-1130688011=:59061
Content-Type: application/octet-stream; name=rep-heb-zip
Content-Transfer-Encoding: base64
Content-Description: 1852776729-rep-heb-zip
Content-Disposition: attachment; filename=rep-heb-zip

IyEgL2Jpbi9jc2ggLWYKIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMj
IyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIwojIyMjIyBUSElTIFND
UklQVCBDT1JSRUNUUyBUSEUgUFJPQkxFTSBXSEVOIFVTSU5HIFVOWklQIE9G
IAojIyMjIyBIRUJSRVcgRklMRSBOQU1FUyBBUkNISVZFRCBXSVRIIFdJTlpJ
UAojIyMjIyBOb3RlOiBhbGwgZmlsZSBuYW1lIHdpdGggISBjaGFyIHdpbGwg
YmUgY2hhbmdlZCB0byBfCiMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMj
IyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMKCiMjIENoZWNrIEFy
Z3VtZW50cwppZiAoJCNhcmd2ICE9IDEpIHRoZW4KCWVjaG8gJ1VzYWdlOiAk
MCBkaXJlY3RvcnkgJwplbHNlCgkjIyBTYXZlIGNvbW1hbmQgbGluZSBhcmdz
IGluIHZhcmlhYmxlcwoJc2V0IGRpciA9ICIkMSIKCgkjUmVtb3ZlIGxhc3Qg
LyBpZiBleGlzdAoJaWYgKCIkZGlyIiA9fiAqLykgdGhlbgoJCXNldCBkaXIg
PSBgZWNobyAiJGRpciIgfCBhd2sgJ3twcmludCBzdWJzdHIoJDAsMSxsZW5n
dGgoJDApLTEpfSdgIAkKCWVuZGlmCgoJI0NoZWNrIGRpcmVjdG9yeSBpcyBu
b3QgZW1wdHkKCXNldCB4ID0gYGxzICIkZGlyIiB8IHdjIC1sYAoJaWYgKCR4
ID1+IDApIHRoZW4KCQlleGl0CgllbmRpZgoKCgkjZ2V0IGlub2RlIG9mIGRp
cmVjdG9yeQoJc2V0IGlub2RlbiA9IGBscyAtaWQgIiRkaXIiIHwgYXdrICd7
cHJpbnQgJDF9J2AKCgkjI2NvbnZlcnQgZmlsZXMgdG8gaGVicmV3CglscyAt
LWZvcm1hdD1zaW5nbGUtY29sdW1uICIkZGlyIiB8IGljb252IC1mIGNwODYy
IC10IHV0ZjggfGNhdCAtYiA+ICJ0ZW1waGVibGlzdCIkaW5vZGVuCgoJI3Bh
c3Mgb24gYWxsIGZpbGVzIGFuZCByZW5hbWUgdG8gaGVicmV3CglAIGNvdW50
ID0gMAoJZm9yZWFjaCBmaWxlICgiJGRpciIvKikKCQlAIGNvdW50ID0gJGNv
dW50ICsgMQoJCQoJCSNnZXQgdGhlIGNvdW50IGhlYnJldyBmaWxlIG5hbWUK
CQlhd2sgJyQxPT0nJGNvdW50JyB7Zm9yKGk9MjtpPD1ORjtpKyspIHByaW50
ICRpfScgInRlbXBoZWJsaXN0IiRpbm9kZW4gID4gInRlbXAxIgoJCWNhdCB0
ZW1wMSB8IHRyICJcbiIgIiAiID4gdGVtcDIKCQlzZXQgaGVicmV3X2ZpbGVf
bmFtZSA9IGBjYXQgdGVtcDJgCgkJcm0gLWYgdGVtcDEKCQlybSAtZiB0ZW1w
MgoKCQkjY2hhbmdlICEgc2lnbnMgdG8gXyB0byBlc2NhcGUgdGhpcyBjaGFy
YWN0ZXIKCQlzZXQgaGVicmV3X2ZpbGVfbmFtZSA9IGBlY2hvICRoZWJyZXdf
ZmlsZV9uYW1lIHwgdHIgISBfYAoKCQkjcmVwbGFjZSBmaWxlIHdpdGggaGVi
cmV3IGZpbGUgbmFtZS4gTG9uZyBjb2RlIHNvIGl0IHdpbGwgd29yayBmb3Ig
ZW5nIG5hbWVzIHRvbwoJCWNwIC1SICIkZmlsZSIgIiRkaXIvdF8kaGVicmV3
X2ZpbGVfbmFtZSIKCQlybSAtUmYgIiRmaWxlIgoJCWNwIC1SICIkZGlyL3Rf
JGhlYnJld19maWxlX25hbWUiICIkZGlyLyRoZWJyZXdfZmlsZV9uYW1lIgoJ
CXJtIC1SZiAiJGRpci90XyRoZWJyZXdfZmlsZV9uYW1lIgoJCQoJCSNpZiBm
aWxlIGlzIGEgZGlyZWN0b3J5IGNhbGwgcmVwbGFjZSBoZWJlcmV3IHJlY3Vy
c2Vpdmx5CgkJaWYoLWQgIiRkaXIvJGhlYnJld19maWxlX25hbWUiKSB0aGVu
CgkJCS4vcmVwLWhlYi16aXAgIiRkaXIvJGhlYnJld19maWxlX25hbWUiCgkJ
ZW5kaWYKCgllbmQKCgkjcmVtb3ZlIHRlbXAgZmlsZSBmb3IgbGlzdCBvZiBo
ZWJyZXcgZmlsZSBuYW1lcwoJcm0gLWYgInRlbXBoZWJsaXN0IiRpbm9kZW4K
CmVuZGlmCg==

--0-1737671725-1130688011=:59061
Content-Type: application/octet-stream; name=zip2gz
Content-Transfer-Encoding: base64
Content-Description: 2280821631-zip2gz
Content-Disposition: attachment; filename=zip2gz

IyEgL2Jpbi9jc2ggLWYKIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMj
IyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIwojIyMjIyBUSElTIFND
UklQVCBDSEFOR0VTIEFMTCBaSVAgRklMRVMgVE8gVEFSLkdaIEZJTEVTCiMj
IyMjIE5vdGU6IHJlcC1oZWItemlwIG11c3QgYmUgaW4gdGhlIGNhbGxpbmcg
ZGlyZWNvcnkKIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMj
IyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIwoKIyMgQ2hlY2sgQXJndW1lbnRz
CmlmICgkI2FyZ3YgIT0gMSkgdGhlbgoJZWNobyAnVXNhZ2U6ICQwIGRpcmVj
dG9yeSAnCmVsc2UKCSMjIFNhdmUgY29tbWFuZCBsaW5lIGFyZ3MgaW4gdmFy
aWFibGVzCglzZXQgZGlyID0gIiQxIgoKCSNSZW1vdmUgbGFzdCAvIGlmIGV4
aXN0CglpZiAoIiRkaXIiID1+ICovKSB0aGVuCgkJc2V0IGRpciA9IGBlY2hv
ICIkZGlyIiB8IGF3ayAne3ByaW50IHN1YnN0cigkMCwxLGxlbmd0aCgkMCkt
MSl9J2AgCQoJZW5kaWYKCgkjQ2hlY2sgZGlyZWN0b3J5IGlzIG5vdCBlbXB0
eQoJc2V0IHggPSBgbHMgIiRkaXIiIHwgd2MgLWxgCglpZiAoJHggPX4gMCkg
dGhlbgoJCWV4aXQKCWVuZGlmCgoJI3Bhc3Mgb24gYWxsIGZpbGVzIGFuZCB0
cnkgdG8gY29udmVydAoJZm9yZWFjaCBmaWxlICgiJGRpciIvKikKCgkJI1Ro
aXMgaXMgYSB6aXAgZmlsZQoJCWlmICgtZiAiJGZpbGUiICYmICIkZmlsZSIg
PX4gKi5belpdW0lpXVtQcF0pIHRoZW4KCQkJCgkJCSNGaWxlIG5hbWUgd2l0
aCBvdXQgemlwCgkJCXNldCBmaWxlX25vemlwID0gYGVjaG8gIiRmaWxlIiB8
IGF3ayAne3ByaW50IHN1YnN0cigkMCwxLGxlbmd0aCgkMCktNCl9J2AgCgoJ
CQkjRXh0cmFjdCBhbmQgZGVsZXRlIHppcAoJCQl1bnppcCAiJGZpbGUiIC1k
ICIkZmlsZV9ub3ppcCIKCQkJcm0gLWYgIiRmaWxlIgoJCQkKCQkJI0ZpeCBo
ZWJyZXcgZmlsZW5hbWVzCgkJCSAuL3JlcC1oZWItemlwICIkZmlsZV9ub3pp
cCIKCgkJCSNDYWxsIHJlY3Vyc2V2bHkKCQkJLi96aXAyZ3ogIiRmaWxlX25v
emlwIgoKCQkJI3NhdmUgY3VycmVudCBkaXJlY29yeQoJCQlzZXQgbG9jYWxf
ZGlyID0gYHB3ZGAKCQkKCQkJI1NhdmUgYWJzb3VsdGUgZmlsZSBuYW1lCgkJ
CXNldCBhYnNfZmlsZSA9IGBlY2hvICIkZmlsZV9ub3ppcCIgfCBhd2sgLUYg

Hebrew filenames from a Windows(XP) zip file.

2004-08-25 Thread Amir Hardon
I'm trying to extract a zip file with Hebrew file names that was created with 
winzip on a Windows XP machine.
It looks like there is an encoding problem, but a weird one.

Just for testing the encoding I listed the file names into a text file ('unzip 
-l  file.txt'), and tried it to convert to different encodings using iconv.
But iconv always failed(No matter which encoding I'm trying to use),
with the following message:
iconv: illegal input sequence at position 112
The first byte that supposed to be Hebrew is at position 112,
it's value is 0xEA which is Kaf sofit in iso-8859-8.

Anyway I just opened the text file with Mozilla and tried to view it using 
every Hebrew or Unicode encoding it supports, but none of them worked.

My last resort was to calculate the difference between the values of the 
letter I get and the letter it should be, the first two letters have the same 
difference (reduce two to get the original letter) but the third letter have 
a different one (add five to get the original letter).
That is strange!

(List's Hebrew haters, please forgive the next paragraph)
Just for the record here is the string I get:
  
Which should be:
  
(Both strings are in logical order)

So I have two questions:
1. (The simple one) What's the problem with iconv?
2. What can I do with the Hebrew filenames?

Thanks!
 -Amir.

=
To unsubscribe, send mail to [EMAIL PROTECTED] with
the word unsubscribe in the message body, e.g., run the command
echo unsubscribe | mail [EMAIL PROTECTED]



Re: Hebrew filenames from a Windows(XP) zip file.

2004-08-25 Thread Omer Zak
Can you run an experiment as follows:
1. Create few files with known Hebrew names in your Windows XP machine.
2. Zip them in your Windows XP machine.
3. Unzip -l them in your Linux machine, and compare strings.
My guess is that Winzip encodes Hebrew filenames in a different way from 
 the way expected by Linux zip.  So when Linux zip unzips the 
filenames, they look different.

If this hypothesis is confirmed, try to play with locale environment 
variables to affect the encoding assumed by Linux zip.

Amir Hardon wrote:
I'm trying to extract a zip file with Hebrew file names that was created with 
winzip on a Windows XP machine.
It looks like there is an encoding problem, but a weird one.

Just for testing the encoding I listed the file names into a text file ('unzip 
-l  file.txt'), and tried it to convert to different encodings using iconv.
But iconv always failed(No matter which encoding I'm trying to use),
with the following message:
iconv: illegal input sequence at position 112
The first byte that supposed to be Hebrew is at position 112,
it's value is 0xEA which is Kaf sofit in iso-8859-8.

Anyway I just opened the text file with Mozilla and tried to view it using 
every Hebrew or Unicode encoding it supports, but none of them worked.

My last resort was to calculate the difference between the values of the 
letter I get and the letter it should be, the first two letters have the same 
difference (reduce two to get the original letter) but the third letter have 
a different one (add five to get the original letter).
That is strange!

(List's Hebrew haters, please forgive the next paragraph)
Just for the record here is the string I get:
  
Which should be:
  
(Both strings are in logical order)
So I have two questions:
1. (The simple one) What's the problem with iconv?
2. What can I do with the Hebrew filenames?
 --- Omer
My own blog is at http://www.livejournal.com/users/tddpirate/
My opinions, as expressed in this E-mail message, are mine alone.
They do not represent the official policy of any organization with which
I may be affiliated in any way.
WARNING TO SPAMMERS:  at http://www.zak.co.il/spamwarning.html
To unsubscribe, 
send mail to [EMAIL PROTECTED] with
the word unsubscribe in the message body, e.g., run the command
echo unsubscribe | mail [EMAIL PROTECTED]


Re: Hebrew filenames from a Windows(XP) zip file.

2004-08-25 Thread Herouth Maoz
Quoting Amir Hardon [EMAIL PROTECTED]:

 Just for testing the encoding I listed the file names into a text file
 ('unzip
 -l  file.txt'), and tried it to convert to different encodings using iconv.
 But iconv always failed(No matter which encoding I'm trying to use),
 with the following message:
 iconv: illegal input sequence at position 112
 The first byte that supposed to be Hebrew is at position 112,
 it's value is 0xEA which is Kaf sofit in iso-8859-8.

Doing an appropriate od on the resulting file may shade more light on the
problem (e.g. od -t x1).

My first guess would be that the names themselves are in ucs-2, but since the
output from zip mixes them with ascii, you get an encoding error, because
ucs-2, unlike utf-8, cannot mix with 1-byte characters.

Herouth

=
To unsubscribe, send mail to [EMAIL PROTECTED] with
the word unsubscribe in the message body, e.g., run the command
echo unsubscribe | mail [EMAIL PROTECTED]



Re: Hebrew filenames from a Windows(XP) zip file.

2004-08-25 Thread Yedidyah Bar-David
Hi,

On Wed, Aug 25, 2004 at 09:57:43AM +0300, Amir Hardon wrote:
 I'm trying to extract a zip file with Hebrew file names that was created with 
 winzip on a Windows XP machine.
 It looks like there is an encoding problem, but a weird one.

This also troubled me for some time. Incidentally, just yesterday I
downloaded unzip's sources, and your email was the last push to read
them.

 
 Just for testing the encoding I listed the file names into a text file ('unzip 
 -l  file.txt'), and tried it to convert to different encodings using iconv.
 But iconv always failed(No matter which encoding I'm trying to use),
 with the following message:
 iconv: illegal input sequence at position 112
 The first byte that supposed to be Hebrew is at position 112,
 it's value is 0xEA which is Kaf sofit in iso-8859-8.
 
 Anyway I just opened the text file with Mozilla and tried to view it using 
 every Hebrew or Unicode encoding it supports, but none of them worked.
 
 My last resort was to calculate the difference between the values of the 
 letter I get and the letter it should be, the first two letters have the same 
 difference (reduce two to get the original letter) but the third letter have 
 a different one (add five to get the original letter).
 That is strange!

unzip has the encoding hard-coded in the source:
/*---

  The following conversion tables translate between IBM PC CP 850
  (OEM codepage) and the Western Europe  America Windows codepage 1252.
  The Windows codepage 1252 contains the ISO 8859-1 Latin 1 codepage,
  with some additional printable characters in the range (0x80 - 0x9F),
  that is reserved to control codes in the ISO 8859-1 character table.

  The ISO -- OEM conversion tables were constructed with the help
  of the WIN32 (Win16?) API's OemToAnsi() and AnsiToOem() conversion
  functions and have been checked against the CP850 and LATIN1 tables
  provided in the MS-Kermit 3.14 distribution.

  ---*/
[snip]
ZCONST uch Far oem2iso[] = {
0xC7, 0xFC, 0xE9, 0xE2, 0xE4, 0xE0, 0xE5, 0xE7,  /* 80 - 87 */
0xEA, 0xEB, 0xE8, 0xEF, 0xEE, 0xEC, 0xC4, 0xC5,  /* 88 - 8F */
0xC9, 0xE6, 0xC6, 0xF4, 0xF6, 0xF2, 0xFB, 0xF9,  /* 90 - 97 */
0xFF, 0xD6, 0xDC, 0xF8, 0xA3, 0xD8, 0xD7, 0x83,  /* 98 - 9F */
0xE1, 0xED, 0xF3, 0xFA, 0xF1, 0xD1, 0xAA, 0xBA,  /* A0 - A7 */
0xBF, 0xAE, 0xAC, 0xBD, 0xBC, 0xA1, 0xAB, 0xBB,  /* A8 - AF */
0xA6, 0xA6, 0xA6, 0xA6, 0xA6, 0xC1, 0xC2, 0xC0,  /* B0 - B7 */
0xA9, 0xA6, 0xA6, 0x2B, 0x2B, 0xA2, 0xA5, 0x2B,  /* B8 - BF */
0x2B, 0x2D, 0x2D, 0x2B, 0x2D, 0x2B, 0xE3, 0xC3,  /* C0 - C7 */
0x2B, 0x2B, 0x2D, 0x2D, 0xA6, 0x2D, 0x2B, 0xA4,  /* C8 - CF */
0xF0, 0xD0, 0xCA, 0xCB, 0xC8, 0x69, 0xCD, 0xCE,  /* D0 - D7 */
0xCF, 0x2B, 0x2B, 0xA6, 0x5F, 0xA6, 0xCC, 0xAF,  /* D8 - DF */
0xD3, 0xDF, 0xD4, 0xD2, 0xF5, 0xD5, 0xB5, 0xFE,  /* E0 - E7 */
0xDE, 0xDA, 0xDB, 0xD9, 0xFD, 0xDD, 0xAF, 0xB4,  /* E8 - EF */
0xAD, 0xB1, 0x3D, 0xBE, 0xB6, 0xA7, 0xF7, 0xB8,  /* F0 - F7 */
0xB0, 0xA8, 0xB7, 0xB9, 0xB3, 0xB2, 0xA6, 0xA0   /* F8 - FF */
};

Reading the comment, and looking a bit with od at the zip and the
output, I understand that the zip itself has DOS hebrew (cp862)
filenames, which unzip expects as cp850, and converts to iso8859-1.
This indeed worked: I created a filename with all the heberw letters,
zipped it witn winzip, unzipped in Linux, then did
ls -l | iconv -f iso8859-1 -t cp850 | iconv -f cp862 -t iso8859-8
and it worked.

 
 (List's Hebrew haters, please forgive the next paragraph)
 Just for the record here is the string I get:
  ?? ???
 Which should be:
  ??? ???
 (Both strings are in logical order)
 
 So I have two questions:
 1. (The simple one) What's the problem with iconv?

That it does not only translate char ranges, it also checks validity.
Running it twice allowed me to trick it. Doing e.g. 'iconv -f iso8859-1
-t iso8859-8' might theoretically work (I am not sure, I have to think
about it), but iconv knows which chars should be in each and does not
agree to work with illegal ones.

 2. What can I do with the Hebrew filenames?

Use the above with some script or, if you have a lot of time, make
unzip use iconv/gconv and allow the user to set the charset :-)
-- 
Didi


=
To unsubscribe, send mail to [EMAIL PROTECTED] with
the word unsubscribe in the message body, e.g., run the command
echo unsubscribe | mail [EMAIL PROTECTED]