[
https://issues.apache.org/jira/browse/BEAM-8822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16986339#comment-16986339
]
Tomo Suzuki commented on BEAM-8822:
-----------------------------------
There are two modules that use Hadoop client dependencies: hadoop-format and
hadoop-file-system.
h1. sdks/java/io/hadoop-format
As per [Hadoop Input/Output Format
IO|https://beam.apache.org/documentation/io/built-in/hadoop/], HadoopFormatIO
(in beam-sdks-java-io-hadoop-format artifact) is not just reading files from
Hadoop, but serves the fundamental class for other file formats such as
Cassandra, HBase, and even Elasticsearch.
Its integration test HadoopFormatIOIT uses PostgreSQL. Setting up PostgreSQL
instance in local MacBook and running HadoopFormatIOIT with IntelliJ worked.
{noformat}
--tests
org.apache.beam.sdk.io.hadoop.format.HadoopFormatIOIT
-DintegrationTestPipelineOptions='[
"--postgresServerName=localhost",
"--postgresUsername=suztomo",
"--postgresDatabaseName=suztomo",
"--postgresPassword=",
"--postgresSsl=false",
"--numberOfRecords=1000"
]'{noformat}
h1. sdks/java/io/hadoop-file-system
HadoopFileSystem is in sdks/java/io/hadoop-file-system module. Its test
HadoopFileSystemTest creates MiniDFSCluster (hadoop-hdfs artifact) and confirms
interaction with it through create and read files. Beam's HadoopFileSystem
class provides functions such as {{match}}, {{create}}, {{open}}, {{copy}}, and
etc.
My initial thought on testing compatibility of Hadoop dependency is to check
such communication between new HDFS and old HDFS client.
But where is HadoopFileSystem used?
> Hadoop Client version 2.8 from 2.7
> ----------------------------------
>
> Key: BEAM-8822
> URL: https://issues.apache.org/jira/browse/BEAM-8822
> Project: Beam
> Issue Type: Bug
> Components: build-system
> Reporter: Tomo Suzuki
> Assignee: Tomo Suzuki
> Priority: Major
> Attachments: OGuVu0A18jJ.png
>
> Time Spent: 4h 20m
> Remaining Estimate: 0h
>
> [~iemejia] says:
> bq. probably a quicker way forward is to unblock the bigtable issue is to
> move our Hadoop dependency to Hadoop 2.8 given that Hadoop 2.7 is now EOL we
> have a good reason to do so
> https://cwiki.apache.org/confluence/display/HADOOP/EOL+%28End-of-life%29+Release+Branches
> The URL says
> {quote}Following branches are EOL:
> [2.0.x - 2.7.x]{quote}
> https://issues.apache.org/jira/browse/BEAM-8569?focusedCommentId=16980532&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16980532
> About compatibility with other library:
> Hadoop client 2.7 is not compatible with Guava > 21 because of
> Objects.toStringHelper. Fortunately Hadoop client 2.8 removed the use of the
> method
> ([detail|https://github.com/GoogleCloudPlatform/cloud-opensource-java/issues/1028#issuecomment-557709027]).
> 2.8.5 is the latest in 2.8.X.
> !OGuVu0A18jJ.png!
--
This message was sent by Atlassian Jira
(v8.3.4#803005)