Edit report at https://bugs.php.net/bug.php?id=65815&edit=1

 ID:                 65815
 Comment by:         matti dot jarvinen at nitroid dot fi
 Reported by:        matti dot jarvinen at nitroid dot fi
 Summary:            ZipArchive reads filenames with UTF-8 characters
                     wrong
 Status:             Open
 Type:               Bug
 Package:            Zip Related
 Operating System:   Fedora 3.8.6-203.fc18.x86_64
 PHP Version:        5.4.20
 Block user comment: N
 Private report:     N

 New Comment:

If zip file contains following files:

test3/12-päivä.pdf
test3/ä中华人民共和国.PDF
test3/Российская Федерация.PDF
test3/中华人民共和国.PDF


ZipArchive will read them as:

test3/12-p�iv�.pdf
test3/ä中华人民共和国.PDF
test3/Российская Федерация.PDF
test3/中华人民共和国.PDF

Broken file names can be changed to correct UTF-8 characters with:

<?php

// correct UTF-8 should hold together through this
if($filename === mb_convert_encoding(mb_convert_encoding($filename, "UTF-32", 
"UTF-8"), "UTF-8", "UTF-32"))
{
  $fixedFilename = $filename;
}else
{
  // otherwise we should use 
  $fixedFilename = mb_convert_encoding($filename, 'UTF-8','CP850');
}

?>

.ZIP File Format Specification Version: 6.3.3 APPENDIX D - Language Encoding 
(EFS) might hold the answers about reading file name encoding correctly from 
the zip file.
http://www.pkware.com/documents/casestudies/APPNOTE.TXT

Codepage if not UTF-8 should be CP437 if I understood correctly from the specs, 
although that encoding is not supported in PHP. I got good results with CP850 
but I cannot verify this with workaround with every character in CP850 and 
CP437.


Previous Comments:
------------------------------------------------------------------------
[2013-10-02 15:51:05] matti dot jarvinen at nitroid dot fi

Description:
------------
I have a valid Zip file created with Windows 8 and with iZarc containing 
filenames like 12-päivä.pdf, 13-päivä.pdf

ZipArchive reads filenames wrong.

At least getNameIndex and extractTo are affected.

Test script:
---------------
<?php 
mb_internal_encoding('UTF-8');
ini_set('default_charset', 'UTF-8');

$Zip = new ZipArchive();

$open = $Zip->open('test.zip');

$length = $Zip->numFiles;

for($i = 0; $i < $length; $i++)
{
  $importName = $Zip->getNameIndex($i);

  print $brokenImportName;

  die();

  // this is a specific workaround. Some characters are stuck in ASCII 
apparently
  //$fixedImportName = str_replace(chr(132),'ä',$brokenImportName);

  //print $fixedImportName;
}

?>

Expected result:
----------------
12-päivä.pdf

Actual result:
--------------
12-p�iv�.pdf


------------------------------------------------------------------------



-- 
Edit this bug report at https://bugs.php.net/bug.php?id=65815&edit=1

Reply via email to