[
https://issues.apache.org/jira/browse/HADOOP-12168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14621268#comment-14621268
]
Sangjin Lee commented on HADOOP-12168:
--------------------------------------
Sorry [~gliptak], somehow I missed email about this and it fell through the
cracks.
I took a look at the patch, and have some high level questions on the patch.
It appears to me that the patch mostly covers projects in hadoop-common-project
and hadoop-tools, and the top level projects. Is that the intended scope of
this patch? Are you going to follow up with subsequent patches to cover the
other projects (yarn, hdfs, mapreduce)?
Also, assuming the scope is hadoop-common-projects, hadoop-tools, and
top-level, the following projects don't seem to have been covered: hadoop-nfs,
hadoop-kms, hadoop-streaming, hadoop-distcp, hadoop-archives, hadoop-rumen,
hadoop-gridmix, hadoop-datajoin, hadoop-ant, hadoop-extras, hadoop-client,
hadoop-sls, hadoop-tools-dist, and hadoop-dist. Could you describe the nature
of this patch and how we will eventually cover the all code base?
While we're at it, what is the scope of this subtask? How is it different from
the main JIRA? From the title, this subtask and the main JIRA seem almost
identical. So I'm somewhat unsure what this subtask tries to address
specifically. Some clarifications on the JIRA and the patch would be greatly
appreciated. Thanks!
Also, general comments on the changes. Fixing *undeclared used* dependencies is
very deterministic and we can simply use the maven dependency analysis to add
them. I don't think there is much complication in fixing them.
On the other hand, fixing *declared unused* dependencies takes a much deeper
look and greater care, or things could break very easily. As you undoubtedly
saw, detecting what's unused is difficult. The maven dependency analysis simply
follows the bytecode analysis and flags anything that's not referenced in the
code. But that's only half of the story.
If a certain dependency is flagged as unused by the maven dependency analysis,
the only thing we can say at that point is that at least that particular
dependency should not be a compile-scope dependency. Whether that dependency
can be completely removed or should stay as a runtime (or test) dependency
really depends on how that is used. One example is slf4j-log4j12. SLF4j uses
implementation binding based on a runtime library being dropped on the
classpath. So normally the only compile-time dependency is to slf4j-api.
However, without a real implementation library (slf4j-log4j, slf4j-jdk, etc.)
present in the classpath, SLF4j simply does not work. This is all based on
dynamic runtime classloading, and it cannot be detected by any static code
analysis. So, removing a SLF4j runtime library from the runtime scope can break
things rather easily. There can be other examples where an unused dependency is
not as straightforward as it seems due to dynamic classloading.
In summary, we need to be 100% confident that a certain runtime dependency is
truly unused and show the reason before we remove it. Hope it helps...
> Clean undeclared used dependencies and declared unused dependencies
> -------------------------------------------------------------------
>
> Key: HADOOP-12168
> URL: https://issues.apache.org/jira/browse/HADOOP-12168
> Project: Hadoop Common
> Issue Type: Sub-task
> Components: build
> Affects Versions: 3.0.0
> Reporter: Gabor Liptak
> Assignee: Gabor Liptak
> Attachments: HADOOP-12168.1.patch, HADOOP-12168.2.patch,
> HADOOP-12168.3.patch
>
>
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)