This is an automated email from the ASF dual-hosted git repository.
pingsutw pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/submarine.git
The following commit(s) were added to refs/heads/master by this push:
new 65e54eab SUBMARINE-1282. add the codes of experiment prehandler module
65e54eab is described below
commit 65e54eabff7d83d617344dc65174f44e8b2a789d
Author: FatalLin <[email protected]>
AuthorDate: Sun Jun 19 16:05:57 2022 +0800
SUBMARINE-1282. add the codes of experiment prehandler module
### What is this PR for?
<!-- A few sentences describing the overall goals of the pull request's
commits.
First time? Check out the contributing guide -
https://submarine.apache.org/contribution/contributions.html
-->
This PR is the first step of our data fetching feature, which allow
submarine could fetch the data from HDFS cluster to local fie system, and we
could persist the data into apache arrow as our next step.
### What type of PR is it?
Feature
### Todos
* [ ] - Task
### What is the Jira issue?
<!-- * Open an issue on Jira
https://issues.apache.org/jira/browse/SUBMARINE/
* Put link here, and add [SUBMARINE-*Jira number*] in PR title, eg.
`SUBMARINE-23. PR title`
-->
https://issues.apache.org/jira/browse/SUBMARINE-1282
### How should this be tested?
<!--
* First time? Setup Travis CI as described on
https://submarine.apache.org/contribution/contributions.html#continuous-integration
* Strongly recommended: add automated unit tests for any new or changed
behavior
* Outline any manual steps to test the PR here.
-->
The new test cases should be added once the following steps has been
developed.
### Screenshots (if appropriate)
### Questions:
* Do the license files need updating? No
* Are there breaking changes for older versions? No
* Does this need new documentation? No
Author: FatalLin <[email protected]>
Signed-off-by: Kevin <[email protected]>
Closes #966 from FatalLin/SUBMARINE-1282 and squashes the following commits:
a1855cb5 [FatalLin] modify codes base on comments
f9c1e365 [FatalLin] add license annocement
8bf4e002 [FatalLin] add the cods of experiment prehandler module
---
bin/experiment-prehandler.sh | 29 ++++++++++++++++
.../docker-images/experiment-prehandler/Dockerfile | 33 ++++++++++++++++++
.../docker-images/experiment-prehandler/build.sh | 40 ++++++++++++++++++++++
.../fs_prehandler/__init__.py | 16 +++++++++
.../fs_prehandler/fs_prehandler.py | 25 ++++++++++++++
.../fs_prehandler/hdfs_prehandler.py | 40 ++++++++++++++++++++++
submarine-experiment-prehandler/prehandler_main.py | 26 ++++++++++++++
7 files changed, 209 insertions(+)
diff --git a/bin/experiment-prehandler.sh b/bin/experiment-prehandler.sh
new file mode 100644
index 00000000..6b53a2ad
--- /dev/null
+++ b/bin/experiment-prehandler.sh
@@ -0,0 +1,29 @@
+#!/bin/bash
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements. See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership. The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+export SUBMARINE_HOME="$(cd "${FWDIR}/.."; pwd)"
+export SUBMARINE_LOG_DIR="${SUBMARINE_HOME}/logs"
+
+if [[ ! -d "${SUBMARINE_LOG_DIR}" ]]; then
+ echo "Log dir doesn't exist, create ${SUBMARINE_LOG_DIR}"
+ $(mkdir -p "${SUBMARINE_LOG_DIR}")
+fi
+
+export CLASSPATH=`$HADOOP_HOME/bin/hadoop classpath --glob`
+
+python3 /opt/submarine-experiment-prehandler/prehandler_main.py
diff --git a/dev-support/docker-images/experiment-prehandler/Dockerfile
b/dev-support/docker-images/experiment-prehandler/Dockerfile
new file mode 100644
index 00000000..87307d07
--- /dev/null
+++ b/dev-support/docker-images/experiment-prehandler/Dockerfile
@@ -0,0 +1,33 @@
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements. See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+FROM adoptopenjdk/openjdk8:jre
+MAINTAINER Apache Software Foundation <[email protected]>
+
+RUN apt-get update
+RUN apt-get -y install python3 python3-pip bash tini
+
+ADD ./tmp/hadoop-3.3.3.tar.gz /opt/
+ADD ./tmp/submarine-experiment-prehandler /opt/submarine-experiment-prehandler
+
+
+ENV HADOOP_HOME=/opt/hadoop-3.3.3
+ENV ARROW_LIBHDFS_DIR=/opt/hadoop-3.3.3/lib/native
+
+RUN python3 -m pip install --upgrade pip \
+ && pip3 install fsspec pyarrow --no-cache-dir
+
+ENTRYPOINT ["/bin/bash"]
+CMD ["/opt/submarine-experiment-prehandler/experiment-prehandler.sh"]
diff --git a/dev-support/docker-images/experiment-prehandler/build.sh
b/dev-support/docker-images/experiment-prehandler/build.sh
new file mode 100755
index 00000000..fcdedb05
--- /dev/null
+++ b/dev-support/docker-images/experiment-prehandler/build.sh
@@ -0,0 +1,40 @@
+#!/usr/bin/env bash
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements. See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+set -euxo pipefail
+
+SUBMARINE_VERSION=0.8.0-SNAPSHOT
+SUBMARINE_IMAGE_NAME="apache/submarine:experiment-prehandler-${SUBMARINE_VERSION}"
+
+export CURRENT_PATH=$(cd "${PWD}">/dev/null; pwd)
+export SUBMARINE_HOME=${CURRENT_PATH}/../../..
+
+mkdir -p "${CURRENT_PATH}/tmp"
+
+cp -r ${SUBMARINE_HOME}/submarine-experiment-prehandler ${CURRENT_PATH}/tmp/
+cp ${SUBMARINE_HOME}/bin/experiment-prehandler.sh
${CURRENT_PATH}/tmp/submarine-experiment-prehandler/
+
+HADOOP_TAR_URL="https://dlcdn.apache.org/hadoop/common/hadoop-3.3.3/hadoop-3.3.3.tar.gz"
+tmpfile=$(mktemp)
+trap "test -f $tmpfile && rm $tmpfile" RETURN
+curl -L -o $tmpfile ${HADOOP_TAR_URL}
+mv $tmpfile ${CURRENT_PATH}/tmp/hadoop-3.3.3.tar.gz
+
+echo "Start building the ${SUBMARINE_IMAGE_NAME} docker image ..."
+docker build -t ${SUBMARINE_IMAGE_NAME} .
+
+# clean temp file
+rm -rf "${CURRENT_PATH}/tmp"
diff --git a/submarine-experiment-prehandler/fs_prehandler/__init__.py
b/submarine-experiment-prehandler/fs_prehandler/__init__.py
new file mode 100644
index 00000000..f4540561
--- /dev/null
+++ b/submarine-experiment-prehandler/fs_prehandler/__init__.py
@@ -0,0 +1,16 @@
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements. See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from fs_prehandler.fs_prehandler import FsPreHandler
diff --git a/submarine-experiment-prehandler/fs_prehandler/fs_prehandler.py
b/submarine-experiment-prehandler/fs_prehandler/fs_prehandler.py
new file mode 100644
index 00000000..0a812181
--- /dev/null
+++ b/submarine-experiment-prehandler/fs_prehandler/fs_prehandler.py
@@ -0,0 +1,25 @@
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements. See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from abc import ABC, abstractmethod
+
+class FsPreHandler(ABC):
+ @abstractmethod
+ def __init__(self):
+ pass
+
+ @abstractmethod
+ def process(self):
+ pass
diff --git a/submarine-experiment-prehandler/fs_prehandler/hdfs_prehandler.py
b/submarine-experiment-prehandler/fs_prehandler/hdfs_prehandler.py
new file mode 100644
index 00000000..60265d3c
--- /dev/null
+++ b/submarine-experiment-prehandler/fs_prehandler/hdfs_prehandler.py
@@ -0,0 +1,40 @@
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements. See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import logging
+import os
+
+from fs_prehandler import FsPreHandler
+from fsspec.implementations.arrow import HadoopFileSystem
+
+class HDFSPreHandler(FsPreHandler):
+ def __init__(self):
+ self.hdfs_host=os.environ['HDFS_HOST']
+ self.hdfs_port=int(os.environ['HDFS_PORT'])
+ self.hdfs_source=os.environ['HDFS_SOURCE']
+ self.dest_path=os.environ['DEST_PATH']
+ self.enable_kerberos=os.environ['ENABLE_KERBEROS']
+
+ logging.info('HDFS_HOST:%s' % self.hdfs_host)
+ logging.info('HDFS_PORT:%d' % self.hdfs_port)
+ logging.info('HDFS_SOURCE:%s' % self.hdfs_source)
+ logging.info('DEST_PATH:%s' % self.dest_path)
+ logging.info('ENABLE_KERBEROS:%s' % self.enable_kerberos)
+
+ self.fs = HadoopFileSystem(host=self.hdfs_host, port=self.hdfs_port)
+
+ def process(self):
+ self.fs.get(self.hdfs_source, self.dest_path, recursive=True)
+ logging.info('fetch data from hdfs://%s:%d/%s to %s complete' %
(self.hdfs_host, self.hdfs_port, self.hdfs_source, self.dest_path))
diff --git a/submarine-experiment-prehandler/prehandler_main.py
b/submarine-experiment-prehandler/prehandler_main.py
new file mode 100644
index 00000000..7e14a926
--- /dev/null
+++ b/submarine-experiment-prehandler/prehandler_main.py
@@ -0,0 +1,26 @@
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements. See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import os
+from fs_prehandler.hdfs_prehandler import HDFSPreHandler
+
+def get_file_system_prehandler():
+ if os.environ['FILE_SYSTEM_TYPE'] == 'HDFS':
+ return HDFSPreHandler()
+
+if __name__ == "__main__":
+ fsPrehandler = get_file_system_prehandler()
+ fsPrehandler.process()
+
+
\ No newline at end of file
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]