[
https://issues.apache.org/jira/browse/COMPRESS-170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13183103#comment-13183103
]
Trejkaz commented on COMPRESS-170:
----------------------------------
Argh, my code missed out a while(true) loop. :( What I get for having no
working clipboard between my development computer and my posting computer.
> Improve robustness when the wrong encoding was passed into
> ZipArchiveInputStream / ZipFile
> ------------------------------------------------------------------------------------------
>
> Key: COMPRESS-170
> URL: https://issues.apache.org/jira/browse/COMPRESS-170
> Project: Commons Compress
> Issue Type: Improvement
> Components: Archivers
> Affects Versions: 1.3
> Reporter: Trejkaz
>
> If a zip file is in one encoding and I try to read it using a different
> encoding, what I expect to happen is that the filenames get garbled but the
> data otherwise extracts correctly (which is what I see using native tools to
> extract a zip file in this fashion.)
> However, what Commons Compress can do is to try and decode a name, fail, and
> ultimately give us no zip entries to work with.
> Here's a test to show what I mean:
> @Test
> public void testWrongEncodingPassedIn() throws Exception {
> // Making the test zip file:
> File inputFile = new File(scratch, "test.dat");
> byte[] inputData = { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 };
> FileUtils.writeByteArrayToFile(inputFile, inputData);
> File file = new File(scratch, "test.zip");
> try (ZipArchiveOutputStream out = new ZipArchiveOutputStream(file)) {
> out.setEncoding("windows-31j");
> ZipArchiveEntry entry = new ZipArchiveEntry(inputFile,
> "\u767A\u8D77\u4EBA\u6C7A\u5B9A\u66F8");
> out.putArchiveEntry(entry);
> out.write(inputData);
> out.closeArchiveEntry();
> }
> // Trying to iterate over it:
> int entryCount = 0;
> try (ZipArchiveInputStream in = new ZipArchiveInputStream(new
> FileInputStream(file), "windows-1252", false)) {
> ZipArchiveEntry entry = in.getNextZipEntry();
> if (entry == null) {
> break;
> }
> entryCount++;
> }
> assertEquals("Wrong number of entries", 1, entryCount);
> }
> In this situation it's definitely the caller's "fault", but unfortunately the
> end user is often the one supplying the encoding and they would rather see
> garbled contents with the actual data intact, than no data at all.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira