[ 
https://issues.apache.org/jira/browse/COMPRESS-212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13543645#comment-13543645
 ] 

Stefan Bodewig commented on COMPRESS-212:
-----------------------------------------

OK, so your tar is using GNU long name entries, I see.  I'll read up on GNU's 
spec whether they say anything about encoding for them (it might turn out it 
always uses UTF8).  Hmm, when we write GNU long name records we use the 
encoding that has been specified, so we certainly are not consistent.

Thank you for the analysis.
                
> TarArchiveEntry getName() returns wrongly encoded name even when you set 
> encoding to TarArchiveInputStream
> ----------------------------------------------------------------------------------------------------------
>
>                 Key: COMPRESS-212
>                 URL: https://issues.apache.org/jira/browse/COMPRESS-212
>             Project: Commons Compress
>          Issue Type: Bug
>    Affects Versions: 1.4.1
>         Environment: Red Hat Enterprise Linux, MS Windows 7
>            Reporter: Woo Ju Shin
>            Priority: Minor
>
> I have two file systems. One is Red Hat Linux, the other is MS Windows.
> I created a *.tgz file in Red Hat Linux and tried to decompress it in MS 
> Windows using Commons Compress.
> The default system encoding are different. UTF-8 in Red Hat Linux and CP949 
> in MS Windows.
> It seems that the file name encoding follows the default encoding even though 
> when I use the following to untar it.
> FileInputStream fis = new FileInputStream(new File(*.tgz));
> TarArchiveInputStream zis = new TarArchiveInputStream(new 
> BufferedInputStream(fis),encodingOfRedHatLinux);
> while ((entry = (TarArchiveEntry)zis.getNextEntry()) != null)
> {
> entry.getName(); // filename is not UTF-8 it is encoded in CP949 and so the 
> filename isn't consistent
> }
> By referring to this
>     /**
>      * Constructor for TarInputStream.
>      * @param is the input stream to use
>      * @param encoding name of the encoding to use for file names
>      * @since Commons Compress 1.4
>      */
>     public TarArchiveInputStream(InputStream is, String encoding) {
>         this(is, TarBuffer.DEFAULT_BLKSIZE, TarBuffer.DEFAULT_RCDSIZE, 
> encoding);
>     }
> encoding should be used for file names.
> But actually this doesn't seem to work.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to