Edit report at https://bugs.php.net/bug.php?id=65815&edit=1
ID: 65815 Comment by: matti dot jarvinen at nitroid dot fi Reported by: matti dot jarvinen at nitroid dot fi Summary: ZipArchive reads filenames with UTF-8 characters wrong Status: Open Type: Bug Package: Zip Related Operating System: Fedora 3.8.6-203.fc18.x86_64 PHP Version: 5.4.20 Block user comment: N Private report: N New Comment: If zip file contains following files: test3/12-päivä.pdf test3/ää¸å人æ°å ±åå½.PDF test3/РоÑÑийÑÐºÐ°Ñ Ð¤ÐµÐ´ÐµÑаÑиÑ.PDF test3/ä¸å人æ°å ±åå½.PDF ZipArchive will read them as: test3/12-p�iv�.pdf test3/ää¸å人æ°å ±åå½.PDF test3/РоÑÑийÑÐºÐ°Ñ Ð¤ÐµÐ´ÐµÑаÑиÑ.PDF test3/ä¸å人æ°å ±åå½.PDF Broken file names can be changed to correct UTF-8 characters with: <?php // correct UTF-8 should hold together through this if($filename === mb_convert_encoding(mb_convert_encoding($filename, "UTF-32", "UTF-8"), "UTF-8", "UTF-32")) { $fixedFilename = $filename; }else { // otherwise we should use $fixedFilename = mb_convert_encoding($filename, 'UTF-8','CP850'); } ?> .ZIP File Format Specification Version: 6.3.3 APPENDIX D - Language Encoding (EFS) might hold the answers about reading file name encoding correctly from the zip file. http://www.pkware.com/documents/casestudies/APPNOTE.TXT Codepage if not UTF-8 should be CP437 if I understood correctly from the specs, although that encoding is not supported in PHP. I got good results with CP850 but I cannot verify this with workaround with every character in CP850 and CP437. Previous Comments: ------------------------------------------------------------------------ [2013-10-02 15:51:05] matti dot jarvinen at nitroid dot fi Description: ------------ I have a valid Zip file created with Windows 8 and with iZarc containing filenames like 12-päivä.pdf, 13-päivä.pdf ZipArchive reads filenames wrong. At least getNameIndex and extractTo are affected. Test script: --------------- <?php mb_internal_encoding('UTF-8'); ini_set('default_charset', 'UTF-8'); $Zip = new ZipArchive(); $open = $Zip->open('test.zip'); $length = $Zip->numFiles; for($i = 0; $i < $length; $i++) { $importName = $Zip->getNameIndex($i); print $brokenImportName; die(); // this is a specific workaround. Some characters are stuck in ASCII apparently //$fixedImportName = str_replace(chr(132),'ä',$brokenImportName); //print $fixedImportName; } ?> Expected result: ---------------- 12-päivä.pdf Actual result: -------------- 12-p�iv�.pdf ------------------------------------------------------------------------ -- Edit this bug report at https://bugs.php.net/bug.php?id=65815&edit=1