[
https://issues.apache.org/jira/browse/HADOOP-18500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Willi Raschkowski updated HADOOP-18500:
---------------------------------------
Description:
Maven-shade-plugin rewrites classes when moving them into {{hadoop-client}}
JARs. That's true even when it doesn't actually need to modify the byte code of
the classes, say for shading.
We use a tool that checks for classpath duplicates that don't have equal byte
code. This tool flags classes brought in via Hadoop, where one JAR containing
them is {{hadoop-client-api}} and {{{}hadoop-client-runtime{}}}, and the other
JAR is {{hadoop-common}} or {{{}hadoop-shaded-guava{}}}. The byte code for the
same class is indeed different between the relocated and non-relocated JARs.
This is because maven-shade-plugin, before 3.3.0, was rewriting class files
even when the relocation was a "no-op". See MSHADE-391 and
[apache/maven-shade-plugin#95|https://github.com/apache/maven-shade-plugin/pull/95].
{quote}Maven Shade internally uses [ASM's
{{ClassRemapper}}|https://asm.ow2.io/javadoc/org/objectweb/asm/commons/ClassRemapper.html]
and defines a custom {{Remapper}} subclass, which takes care of relocation,
partially doing the work by itself and partially delegating to the ASM parent
class. An ASM {{ClassReader}} reads each class file from the original JAR and
*unconditionally* writes it into a {{{}ClassWriter{}}}, plugging in the
transformer.
This transformation, even if not a single relocation (package name mapping)
takes place, often leads to binary differences between original class and
transformed class, because constant pool or stack map frames have been
adjusted, not changing the functionality of the class, but making it look like
something changed when comparing class files before and after the relocation
process.
{quote}
Upgrading to maven-shade-plugin 3.3.0 fixes the unnecessary rewrite of classes.
was:
Maven-shade-plugin rewrites classes when moving them into {{hadoop-client}}
JARs. That's true even when it doesn't actually need to modify the byte code of
the classes, say for shading.
We use a tool that checks for classpath duplicates that don't have equal byte
codes. We noticed that it flags classes brought in via Hadoop, where one JAR
containing them is {{hadoop-client-api}} and {{{}hadoop-client-runtime{}}}, and
the other JAR is {{hadoop-common}} or {{{}hadoop-shaded-guava{}}}. The byte
code for the same class is indeed different between the relocated and
non-relocated JARs.
This is because maven-shade-plugin, before 3.3.0, was rewriting class files
even when the relocation was a "no-op". See MSHADE-391 and
[apache/maven-shade-plugin#95|https://github.com/apache/maven-shade-plugin/pull/95].
{quote}
Maven Shade internally uses [ASM's
{{ClassRemapper}}|https://asm.ow2.io/javadoc/org/objectweb/asm/commons/ClassRemapper.html]
and defines a custom {{Remapper}} subclass, which takes care of relocation,
partially doing the work by itself and partially delegating to the ASM parent
class. An ASM {{ClassReader}} reads each class file from the original JAR and
*unconditionally* writes it into a {{{}ClassWriter{}}}, plugging in the
transformer.
This transformation, even if not a single relocation (package name mapping)
takes place, often leads to binary differences between original class and
transformed class, because constant pool or stack map frames have been
adjusted, not changing the functionality of the class, but making it look like
something changed when comparing class files before and after the relocation
process.
{quote}
Upgrading to maven-shade-plugin 3.3.0 fixes the unnecessary rewrite of classes.
> Upgrade maven-shade-plugin to 3.3.0
> -----------------------------------
>
> Key: HADOOP-18500
> URL: https://issues.apache.org/jira/browse/HADOOP-18500
> Project: Hadoop Common
> Issue Type: Improvement
> Components: build
> Reporter: Willi Raschkowski
> Assignee: Ashutosh Gupta
> Priority: Minor
> Labels: pull-request-available
>
> Maven-shade-plugin rewrites classes when moving them into {{hadoop-client}}
> JARs. That's true even when it doesn't actually need to modify the byte code
> of the classes, say for shading.
> We use a tool that checks for classpath duplicates that don't have equal byte
> code. This tool flags classes brought in via Hadoop, where one JAR containing
> them is {{hadoop-client-api}} and {{{}hadoop-client-runtime{}}}, and the
> other JAR is {{hadoop-common}} or {{{}hadoop-shaded-guava{}}}. The byte code
> for the same class is indeed different between the relocated and
> non-relocated JARs.
> This is because maven-shade-plugin, before 3.3.0, was rewriting class files
> even when the relocation was a "no-op". See MSHADE-391 and
> [apache/maven-shade-plugin#95|https://github.com/apache/maven-shade-plugin/pull/95].
> {quote}Maven Shade internally uses [ASM's
> {{ClassRemapper}}|https://asm.ow2.io/javadoc/org/objectweb/asm/commons/ClassRemapper.html]
> and defines a custom {{Remapper}} subclass, which takes care of relocation,
> partially doing the work by itself and partially delegating to the ASM parent
> class. An ASM {{ClassReader}} reads each class file from the original JAR and
> *unconditionally* writes it into a {{{}ClassWriter{}}}, plugging in the
> transformer.
> This transformation, even if not a single relocation (package name mapping)
> takes place, often leads to binary differences between original class and
> transformed class, because constant pool or stack map frames have been
> adjusted, not changing the functionality of the class, but making it look
> like something changed when comparing class files before and after the
> relocation process.
> {quote}
> Upgrading to maven-shade-plugin 3.3.0 fixes the unnecessary rewrite of
> classes.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]