Kaifeng Huang created FLINK-16355:
-------------------------------------
Summary: Inconsistent library versions notice.
Key: FLINK-16355
URL: https://issues.apache.org/jira/browse/FLINK-16355
Project: Flink
Issue Type: Improvement
Reporter: Kaifeng Huang
Attachments: apache flink.pdf
Hi. I have implemented a tool to detect library version inconsistencies. Your
project have 9 inconsistent libraries and 9 false consistent libraries.
Take org.apache.hadoop:hadoop-common for example, this library is declared as
version 2.4.1 in flink-yarn-tests, 3.1.0 in flink-filesystems/flink-s3-fs-base,
2.7.5 in flink-table/flink-sql-client and etc... Such version inconsistencies
may cause unnecessary maintenance effort in the long run. For example, if two
modules become inter-dependent, library version conflict may happen. It has
already become a common issue and hinders development progress. Thus a version
harmonization is necessary.
Provided we applied a version harmonization, I calculated the cost it may have
to harmonize to all upper versions including an up-to-date one. The cost refers
to POM config changes and API invocation changes. Take
org.apache.hadoop:hadoop-common for example, if we harmonize all the library
versions into 3.1.3. The concern is, how much should the project code adapt to
the newer library version. We list an effort table to quantify the
harmonization cost.
The effort table is listed below. It shows the overall harmonization effort by
modules. The columns represents the number of library APIs and API
calls(NA,NAC), deleted APIs and API calls(NDA,NDAC) as well as modified API and
API calls(NMA,NMAC). Modified APIs refers to those APIs whose call graph is not
the same as previous version. Take the first row for example, if upgrading the
library into version 3.1.3. Given that 103 APIs is used in module
flink-filesystems/flink-fs-hadoop-shaded, 0 of them is deleted in a recommended
version(which will throw a NoMethodFoundError unless re-compiling the project),
55 of them is regarded as modified which could break the former API contract.
||Index||Module||NA(NAC)||NDA(NDAC)||NMA(NMAC)||
|1|flink-filesystems/flink-fs-hadoop-shaded|103(223)|0(0)|55(115)|
|2|flink-filesystems/flink-s3-fs-base|2(4)|0(0)|1(1)|
|3|flink-yarn-tests|0(0)|0(0)|0(0)|
|4|..|..|..|..|
Also we provided another table to show the potential files that may be affected
due to library API change, which could help to spot the concerned API usage and
rerun the test cases. The table is listed below.
||Module||File||Type||API||
|flink-filesystems/flink-s3-fs-base|flink-filesystems/flink-s3-fs-base/src/main/java/org/apache/flink/fs/s3/common/writer/S3RecoverableMultipartUploadFactory.java|modify|org.apache.hadoop.fs.Path.isAbsolute()|
|flink-filesystems/flink-fs-hadoop-shaded|flink-filesystems/flink-fs-hadoop-shaded/src/main/java/org/apache/hadoop/util/VersionInfo.java|modify|org.apache.hadoop.util.VersionInfo._getDate()|
|flink-filesystems/flink-fs-hadoop-shaded|flink-filesystems/flink-fs-hadoop-shaded/src/main/java/org/apache/hadoop/util/VersionInfo.java|modify|org.apache.hadoop.util.VersionInfo._getBuildVersion()|
|4|..|..|..|
As for false consistency, take log4j log4j jar for example. The library is
declared in version 1.2.17 in all modules. However they are declared
differently. As components are developed in parallel, if one single library
version is updated, which could become inconsistent as mentioned above, may
cause above-mentioned inconsistency issues
If you are interested, you can have a more complete and detailed report in the
attached PDF file.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)