For each file inside the directory $output, I do a cat to the file and generate a sha256 hash. This script takes 9 minutes to read 105 files, with the total data of 556MB and generate the digests. Is there a way to make this script faster? Maybe generate digests in parallel?
for path in $output
do
# sha256sum
digests[$count]=$( $HADOOP_HOME/bin/hdfs dfs -cat "$path" | sha256sum | awk
'{ print $1 }')
(( count ++ ))
done
Thanks,
