I’ve written an additional parquet-tools helper script to access files in the 
hadoop environment. It is  along the lines of the existing parquet-schema etc. 
scripts provided as part of the parquet-tools distribution, and is designed to 
live in the same location as these scripts.

Hopefully others will also find it useful.

Cheers — Chris

—— cut ——

#!/usr/bin/env bash
#
# Author:
#     Chris Mathews
#     CTO and Co-Founder, SysMech
#     www.sysmech.co.uk
#

# Determine the path to this script's directory
APPPATH=$( cd "$(dirname "$0")" ; pwd -P )

# NOTE: pre-requsite
#  1.   HADOOP_HOME must be defined
#  2.   create link to current parquet-tools-<version>.jar library
#       eg: ln -s parquet-tools-1.8.1.jar parquet-tools.jar
#
PARQUET_TOOLS="${APPPATH}/lib/parquet-tools.jar"

if [ -z "${HADOOP_HOME}" ]
then
    echo ""
    echo "warning: HADOOP_HOME not define!"
    echo ""

elif (! [ -f ${PARQUET_TOOLS} ] )
then
    echo ""
    echo "warning: file ${PARQUET_TOOLS} not found!"
    echo ""
    echo "info: create a link to the current parquet-tools library."
    echo "info: e.g.: ln -s ${APPPATH}/lib/parquet-tools-1.8.1.jar 
${APPPATH}/lib/parquet-tools.jar"
    echo ""

else
    # Run the application
    ${HADOOP_HOME}/bin/hadoop jar ${PARQUET_TOOLS} "$@"
fi

—— cut ——


Reply via email to