[ 
https://issues.apache.org/jira/browse/HADOOP-18500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Willi Raschkowski updated HADOOP-18500:
---------------------------------------
    Description: 
Maven-shade-plugin rewrites classes when moving them into {{hadoop-client}} 
JARs. That's true even when it doesn't actually need to modify the byte code of 
the classes, say for shading.

We use a tool that checks for classpath duplicates that don't have equal byte 
code. This tool flags classes brought in via Hadoop. The classes it flagged 
came on one side from 
a JAR containing relocated classes ({{hadoop-client-api}} or {{-runtime}}) and 
the other from the relocated JAR ({{hadoop-common}} or 
{{hadoop-shaded-guava}}). We checked and the byte code for the same class is 
indeed different between the relocated and non-relocated JARs.

This is because maven-shade-plugin, before 3.3.0, was rewriting class files 
even when the relocation was a "no-op". See MSHADE-391 and 
[apache/maven-shade-plugin#95|https://github.com/apache/maven-shade-plugin/pull/95].
{quote}Maven Shade internally uses [ASM's 
{{ClassRemapper}}|https://asm.ow2.io/javadoc/org/objectweb/asm/commons/ClassRemapper.html]
 and defines a custom {{Remapper}} subclass, which takes care of relocation, 
partially doing the work by itself and partially delegating to the ASM parent 
class. An ASM {{ClassReader}} reads each class file from the original JAR and 
*unconditionally* writes it into a {{{}ClassWriter{}}}, plugging in the 
transformer.

This transformation, even if not a single relocation (package name mapping) 
takes place, often leads to binary differences between original class and 
transformed class, because constant pool or stack map frames have been 
adjusted, not changing the functionality of the class, but making it look like 
something changed when comparing class files before and after the relocation 
process.
{quote}
Upgrading to maven-shade-plugin 3.3.0 fixes the unnecessary rewrite of classes.

  was:
Maven-shade-plugin rewrites classes when moving them into {{hadoop-client}} 
JARs. That's true even when it doesn't actually need to modify the byte code of 
the classes, say for shading.

We use a tool that checks for classpath duplicates that don't have equal byte 
code. This tool flags classes brought in via Hadoop, where one JAR containing 
them is {{hadoop-client-api}} and {{{}hadoop-client-runtime{}}}, and the other 
JAR is {{hadoop-common}} or {{{}hadoop-shaded-guava{}}}. The byte code for the 
same class is indeed different between the relocated and non-relocated JARs.

This is because maven-shade-plugin, before 3.3.0, was rewriting class files 
even when the relocation was a "no-op". See MSHADE-391 and 
[apache/maven-shade-plugin#95|https://github.com/apache/maven-shade-plugin/pull/95].
{quote}Maven Shade internally uses [ASM's 
{{ClassRemapper}}|https://asm.ow2.io/javadoc/org/objectweb/asm/commons/ClassRemapper.html]
 and defines a custom {{Remapper}} subclass, which takes care of relocation, 
partially doing the work by itself and partially delegating to the ASM parent 
class. An ASM {{ClassReader}} reads each class file from the original JAR and 
*unconditionally* writes it into a {{{}ClassWriter{}}}, plugging in the 
transformer.

This transformation, even if not a single relocation (package name mapping) 
takes place, often leads to binary differences between original class and 
transformed class, because constant pool or stack map frames have been 
adjusted, not changing the functionality of the class, but making it look like 
something changed when comparing class files before and after the relocation 
process.
{quote}
Upgrading to maven-shade-plugin 3.3.0 fixes the unnecessary rewrite of classes.


> Upgrade maven-shade-plugin to 3.3.0
> -----------------------------------
>
>                 Key: HADOOP-18500
>                 URL: https://issues.apache.org/jira/browse/HADOOP-18500
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: build
>            Reporter: Willi Raschkowski
>            Assignee: Ashutosh Gupta
>            Priority: Minor
>              Labels: pull-request-available
>
> Maven-shade-plugin rewrites classes when moving them into {{hadoop-client}} 
> JARs. That's true even when it doesn't actually need to modify the byte code 
> of the classes, say for shading.
> We use a tool that checks for classpath duplicates that don't have equal byte 
> code. This tool flags classes brought in via Hadoop. The classes it flagged 
> came on one side from 
> a JAR containing relocated classes ({{hadoop-client-api}} or {{-runtime}}) 
> and the other from the relocated JAR ({{hadoop-common}} or 
> {{hadoop-shaded-guava}}). We checked and the byte code for the same class is 
> indeed different between the relocated and non-relocated JARs.
> This is because maven-shade-plugin, before 3.3.0, was rewriting class files 
> even when the relocation was a "no-op". See MSHADE-391 and 
> [apache/maven-shade-plugin#95|https://github.com/apache/maven-shade-plugin/pull/95].
> {quote}Maven Shade internally uses [ASM's 
> {{ClassRemapper}}|https://asm.ow2.io/javadoc/org/objectweb/asm/commons/ClassRemapper.html]
>  and defines a custom {{Remapper}} subclass, which takes care of relocation, 
> partially doing the work by itself and partially delegating to the ASM parent 
> class. An ASM {{ClassReader}} reads each class file from the original JAR and 
> *unconditionally* writes it into a {{{}ClassWriter{}}}, plugging in the 
> transformer.
> This transformation, even if not a single relocation (package name mapping) 
> takes place, often leads to binary differences between original class and 
> transformed class, because constant pool or stack map frames have been 
> adjusted, not changing the functionality of the class, but making it look 
> like something changed when comparing class files before and after the 
> relocation process.
> {quote}
> Upgrading to maven-shade-plugin 3.3.0 fixes the unnecessary rewrite of 
> classes.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to