davidsheard opened a new issue #1759:
URL: https://github.com/apache/hudi/issues/1759
Hi,
We can't seem to get our Hudi Table to show in Hive on Cloudera. We have
dropped the Hudi jar into Hive Auxiliary JARs Directory and restarted Hive. But
no luck. We are hoping to Demo the merits of Hudi but can't until we can
rectify the Hive issue.
Spark Config:
SparkConf conf = new SparkConf();
conf.setAppName("Hudi Test");
conf.set("spark.debug.maxToStringFields", "100");
conf.set("spark.sql.shuffle.partitions", "2001");
conf.set("spark.sql.warehouse.dir", "/user/hive/warehouse");
conf.set("spark.sql.autoBroadcastJoinThreshold","31457280");
conf.set("spark.sql.hive.filesourcePartitionFileCacheSize","2000000000");
conf.set("spark.sql.sources.partitionOverwriteMode","dynamic");
conf.set("mapreduce.input.fileinputformat.input.dir.recursive","true");
conf.set("spark.storage.replication.proactive","true");
Hudi Config:
forms.write()
.format("org.apache.hudi")
.option(DataSourceWriteOptions.RECORDKEY_FIELD_OPT_KEY(), "id_trans")
.option(DataSourceWriteOptions.PARTITIONPATH_FIELD_OPT_KEY(), "id_form_str")
.option(DataSourceWriteOptions.PRECOMBINE_FIELD_OPT_KEY(), "update_dttm")
.option(DataSourceWriteOptions.HIVE_STYLE_PARTITIONING_OPT_KEY(), "true")
.option(DataSourceWriteOptions.TABLE_TYPE_OPT_KEY(),
DataSourceWriteOptions.COW_TABLE_TYPE_OPT_VAL())
.option(DataSourceWriteOptions.OPERATION_OPT_KEY(),
DataSourceWriteOptions.BULK_INSERT_OPERATION_OPT_VAL())
.option(DataSourceWriteOptions.HIVE_URL_OPT_KEY(),
"jdbc:hive2://localhost:10000")
.option(HoodieWriteConfig.TABLE_NAME, "david.davhudi2")
.mode(SaveMode.Append)
.save(savePath);
Environment Description
Hudi version : 0.5.3
Spark version : 2.40
Cloudera version : 6.33
Hadoop version : 3.0.0
Storage (HDFS/S3/GCS..) : HDFS
Running on Docker? (yes/no) : No
Spark-submit:
output_name=`date +%s`
log4j_setting='-Dlog4j.configuration=file:log4j.properties'
echo "Running Spark-submit"
echo `date`
SPARK_CMD="spark2-submit \
--files log4j.properties
--files /etc/hive/conf.cloudera.hive/hive-site.xml
--conf "spark.driver.extraJavaOptions=${log4j_setting}" \
--conf "spark.executor.extraJavaOptions=${log4j_setting}" \
--master yarn \
--deploy-mode client \
--num-executors 30 \
--executor-memory 16g \
--driver-memory 10g \
--queue root.adhoc.dataScientists \
--conf spark.scheduler.mode=FAIR \
--conf yarn.nodemanager.vmem-check-enabled=false \
--conf spark.executor.memoryOverhead=1072 \
--conf spark.driver.memoryOverhead=2048 \
--conf spark.executor.cores=2 \
--conf spark.kryoserializer.buffer.max=2000m \
--conf spark.serializer=org.apache.spark.serializer.KryoSerializer \
--conf spark.sql.hive.convertMetastoreParquet=false \
--conf spark.executor.heartbeatInterval=120s \
--conf spark.network.timeout=600s \
--conf spark.sql.catalogImplementation=hive \
--class 'hudi.DataLoader' \
'hudi-poc-0.0.1-SNAPSHOT.jar' $1"
eval nohup $SPARK_CMD > logs/run_hudi_forms_bulk.log 2>&1 &
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]