Hi,

We have recently run into this issue:
https://issues.apache.org/jira/browse/SPARK-9042

My organization's application reads raw data from files, processes/cleanses
it and pushes the results to Hive tables. To keep reads efficient, we have
partitioned our tables. In a Sentry enabled cluster, our writes to Hive
tables fail as Hive Context tries to edit partitions in meta store directly
and Sentry has disabled direct edits in Hive Meta Store.

After discussing our options with Cloudera Support, current workaround for
us is to generate bunch of files at the end of Spark process and open a
separate connection to HiveServer2 to load those files. We can change our
tables to be external tables to reduce data movement. Regardless, it's a
stop gap measure as we need to open separate connection to HiveServer2 to
manage the partitions.

This also affects all Hive CTAS + DDLs supported from within Hive Context.
We'd like to know where Hive Support within Spark is headed with Security
products like Sentry or Ranger in place.

Thanks,
Charmee

Reply via email to