imbajin commented on code in PR #724:
URL: 
https://github.com/apache/hugegraph-toolchain/pull/724#discussion_r3071747441


##########
hugegraph-loader/assembly/travis/install-hadoop.sh:
##########
@@ -17,31 +17,55 @@
 #
 set -ev
 
-sudo wget 
https://archive.apache.org/dist/hadoop/common/hadoop-2.8.5/hadoop-2.8.5.tar.gz
+HADOOP_VERSION="2.8.5"
+HADOOP_TARBALL="hadoop-${HADOOP_VERSION}.tar.gz"
+HADOOP_HOME="/usr/local/hadoop"
+HADOOP_TARBALL_PATH="${HOME}/${HADOOP_TARBALL}"
 
-sudo tar -zxf hadoop-2.8.5.tar.gz -C /usr/local
-cd /usr/local
-sudo mv hadoop-2.8.5 hadoop
-#sudo chown -R travis ./hadoop
-cd hadoop
+if [[ ! -d "${HADOOP_HOME}" ]]; then
+    if [[ ! -f "${HADOOP_TARBALL_PATH}" ]]; then
+        echo "Downloading Hadoop ${HADOOP_VERSION}..."
+        wget -O "${HADOOP_TARBALL_PATH}" 
"https://archive.apache.org/dist/hadoop/common/hadoop-${HADOOP_VERSION}/${HADOOP_TARBALL}";
+    else
+        echo "Using cached Hadoop tarball at ${HADOOP_TARBALL_PATH}"
+    fi
+    echo "Extracting Hadoop to ${HADOOP_HOME}..."
+    sudo tar -zxf "${HADOOP_TARBALL_PATH}" -C /usr/local
+    cd /usr/local
+    sudo mv "hadoop-${HADOOP_VERSION}" hadoop
+else
+    echo "Hadoop already installed at ${HADOOP_HOME}, skipping download and 
extraction."
+fi
+
+cd "${HADOOP_HOME}"
 pwd
 
-echo "export HADOOP_HOME=/usr/local/hadoop" >> ~/.bashrc
-echo "export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native" >> ~/.bashrc
-echo "export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin" >> ~/.bashrc
+# Export for GitHub Actions subsequent steps
+if [[ -n "${GITHUB_ENV:-}" ]]; then
+    echo "HADOOP_HOME=${HADOOP_HOME}" >> "${GITHUB_ENV}"
+    echo "HADOOP_COMMON_LIB_NATIVE_DIR=${HADOOP_HOME}/lib/native" >> 
"${GITHUB_ENV}"
+    echo "PATH=${PATH}:${HADOOP_HOME}/bin:${HADOOP_HOME}/sbin" >> 
"${GITHUB_ENV}"
+fi
+
+echo "export HADOOP_HOME=${HADOOP_HOME}" >> ~/.bashrc

Review Comment:
   ⚠️ The PR is making this script reusable across cached/repeated runs, but 
these unconditional `~/.bashrc` appends still make it non-idempotent: every 
invocation adds another Hadoop segment to `PATH`, so a reused runner or local 
repro environment keeps growing shell state. It would be safer to guard these 
writes (or rely on `GITHUB_ENV`/`GITHUB_PATH` in CI) so repeated runs stay 
stable.
   
   ```suggestion
   if ! grep -qxF "export HADOOP_HOME=${HADOOP_HOME}" ~/.bashrc; then
       echo "export HADOOP_HOME=${HADOOP_HOME}" >> ~/.bashrc
   fi
   if ! grep -qxF "export 
HADOOP_COMMON_LIB_NATIVE_DIR=${HADOOP_HOME}/lib/native" ~/.bashrc; then
       echo "export HADOOP_COMMON_LIB_NATIVE_DIR=${HADOOP_HOME}/lib/native" >> 
~/.bashrc
   fi
   if ! grep -qxF "export PATH=\$PATH:${HADOOP_HOME}/bin:${HADOOP_HOME}/sbin" 
~/.bashrc; then
       echo "export PATH=\$PATH:${HADOOP_HOME}/bin:${HADOOP_HOME}/sbin" >> 
~/.bashrc
   fi
   ```



##########
hugegraph-loader/assembly/travis/install-mysql.sh:
##########
@@ -23,12 +23,20 @@ TRAVIS_DIR=$(dirname $0)
 CONF=hugegraph-test/src/main/resources/hugegraph.properties
 MYSQL_USERNAME=root
 
-# Set MySQL configurations
+DB_NAME="${1:-mysql}"
+DB_PASS="${2:-root}"
+
+# Skip if MySQL is already running (e.g., provided by GitHub Actions service)
+if docker ps | grep -q "mysql:5.7"; then

Review Comment:
   ⚠️ This broad container check changes the script semantics in a risky way. 
If a developer already has any unrelated `mysql:5.7` container running, the 
script now exits without ensuring the requested test container 
name/password/port match what the loader tests expect. That can turn a clean 
setup failure into a much harder-to-diagnose auth/connection failure later. 
Could we scope the idempotency check to the requested container instead?
   
   ```suggestion
   if docker ps --format '{{.Names}}' | grep -qx "${DB_NAME}"; then\n    echo 
"MySQL container ${DB_NAME} is already running, skipping setup."\n    exit 
0\nfi\n```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to