[jira] [Commented] (SUREFIRE-1220) Surefire never outputs UTF-8 under Windows

Michael Osipov (JIRA) Fri, 29 Jan 2016 15:59:51 -0800

    [ 
https://issues.apache.org/jira/browse/SUREFIRE-1220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15124469#comment-15124469
 ]


Michael Osipov commented on SUREFIRE-1220:
------------------------------------------

So, what is is happening here? You have two VMs running, your Maven VM and the 
forked Surefire VM. The Maven VM runs with {{Cp1252}} and the Surefire one with 
{{UTF-8}}. As you know, you cannot narrow down {{UTF-8}} to {{Cp1252}}. Lets 
assume for now that they are mappable. Surefire has to ouput channels, 
({{DirectConsoleOutput}}) and ({{ConsoleOutputFileReporter}}). The first one 
does {{CharBuffer decode = Charset.defaultCharset().newDecoder().decode( 
ByteBuffer.wrap( buf, off, len ) ); stream.append( decode );}} and the second 
one {{fileOutputStream = new FileOutputStream( file ); fileOutputStream.write( 
buf, off, len );}}. Both use default VM file encoding. Now lets get to the 
consumption of standard output of the forked VM. {{ForkStarter}} passes 
{{FORK_STREAM_CHARSET_NAME}} ({{ISO-8859-1}}) to 
{{executeCommandLineAsCallable}}. A {{StreamPumper}} is created but that 
{{Charset}} is never passed so the {{InputStreamReader}} is again created with 
default encoding. The {{ThreadedStreamConsumer}} consumes the specially encoded 
output from the Surefire Booter. The booter in turn maps all bytes written to 
{{stderr}} or {{stdout}} to a 7 bit alignment ({{ASCII}}) and properly decodes 
them in {{ForkClient}} but this one does {{ByteBuffer defaultEncoded = 
DEFAULT_CHARSET.encode( decodedFromSourceCharset );}} and your output is broken 
of course.

To make a long story short, the encoding and decoding of out {{System.out}}s 
are done marvelously but at the end, trying to maps chars to an encoding which 
does not support it simply won't work between two VMs. The result is that there 
no bug appearantly but {{chcp}} Maven's encoding 
({{MAVEN_OPTS=-Dfile.encoding=...}} and the forked one 
{{<argLine>-Dfile.encoding=...</argLine>}} have to match, everything else is 
undefined. Why did {{exec}} work? For one simple reason, the output of {{java}} 
was passed as-is and {{chcp}} and the forked encoding did match.

> Surefire never outputs UTF-8 under Windows
> ------------------------------------------
>
>                 Key: SUREFIRE-1220
>                 URL: https://issues.apache.org/jira/browse/SUREFIRE-1220
>             Project: Maven Surefire
>          Issue Type: Bug
>          Components: Maven Surefire Plugin
>    Affects Versions: 2.19.1
>         Environment: Windows 10, 64-bit
> DejaVu Sans font
>            Reporter: Gili
>         Attachments: 2016-01-29_113906.png, exec_exec.png, output.exec.txt, 
> output.test.txt, surefire-1220.zip, test.png
>
>
> I'm having problems getting Surefire to output UTF-8 fonts under Windows.
> When I run a unit test that outputs a Guava Range ("10‥20") the TWO DOT 
> LEADER unicode character always gets rendered as a question mark.
> If I run the exact same code outside of Surefire (using a main() entry point) 
> the UTF-8 character renders just fine. The repro steps are quite simple:
> # Create a Maven project.
> # Run {code}System.out.println(Range.closed(10, 30));{code} in a Java class 
> with a main() entry point, and from a JUnit test.
> # The main() entry point will output UTF-8 just fine. The JUnit test will 
> output a question mark in place of the unicode.
> Here is my pom.xml file:
> {code}
> <?xml version="1.0" encoding="UTF-8"?>
> <project xmlns="http://maven.apache.org/POM/4.0.0"; 
> xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"; 
> xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 
> http://maven.apache.org/xsd/maven-4.0.0.xsd";>
>     <modelVersion>4.0.0</modelVersion>
>     <groupId>com.mycompany</groupId>
>     <artifactId>mavenproject1</artifactId>
>     <version>1.0-SNAPSHOT</version>
>     <packaging>jar</packaging>
>     <build>
>         <plugins>
>             <plugin>
>                 <groupId>org.apache.maven.plugins</groupId>
>                 <artifactId>maven-surefire-plugin</artifactId>
>                 <version>2.19.1</version>
>                 <configuration>
>                     <argLine>-Dfile.encoding=UTF-8</argLine>
>                 </configuration>
>             </plugin>
>             <plugin>
>                 <groupId>org.codehaus.mojo</groupId>
>                 <artifactId>exec-maven-plugin</artifactId>
>                 <version>1.4.0</version>
>                 <executions>
>                     <execution>
>                         <goals>
>                             <goal>java</goal>
>                         </goals>
>                     </execution>
>                 </executions>
>                 <configuration>
>                     <mainClass>foo.Main</mainClass>
>                 </configuration>
>             </plugin>
>         </plugins>
>     </build>
>     <dependencies>
>         <dependency>
>             <groupId>com.google.guava</groupId>
>             <artifactId>guava</artifactId>
>             <version>19.0</version>
>         </dependency>
>         <dependency>
>             <groupId>junit</groupId>
>             <artifactId>junit</artifactId>
>             <version>4.12</version>
>             <scope>test</scope>
>         </dependency>
>     </dependencies>
>     <properties>
>         <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
>         <maven.compiler.source>1.8</maven.compiler.source>
>         <maven.compiler.target>1.8</maven.compiler.target>
>     </properties>
> </project>
> {code}
> I tried the same thing using TestNG tests and noticed that although output to 
> console was still wrong, the outputted testng-results.xml file contained the 
> correct character.
> Can you reproduce this on your end?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (SUREFIRE-1220) Surefire never outputs UTF-8 under Windows

Reply via email to