Hello Matt,

Thank you for your reply!

With "ExecuteProcess", I am able to execute a command (Hive/ Beeline)
Actually our use case is relatively simple, so if you have any other
suggestions that would be helpful.


   1. We are writing to a HDFS directory which is the location for an *external
   Hive table*. However when we write/ introduce a new *partition*, we need
   to execute a *repair in order to update the metadata*. I was wondering
   if there is any better way to do this (apart from executing the command
   separately through a processor and that too we are not sure about the
   frequency of this execution) which you are aware of.
   2. Can't we use generic JDBC driver to connect to Hive and execute
   commands like we do for any other Database (like we did for PostgreSQL also)


Thanking you in advance!

​
______________________

*Kind Regards,*
*Anshuman Ghosh*
*Contact - +49 179 9090964*


On Wed, Mar 22, 2017 at 6:27 PM, Matt Burgess <[email protected]> wrote:

> Anshuman,
>
> According to [1], it looks like CDH 5.10 also uses an Apache Hive
> 1.1.0 baseline, and looking through the changes [2] I didn't see
> anything related to the client_protocol field being added.  You are
> right that ExecuteProcess should also work with a beeline command, the
> major difference is that ExecuteProcess does not accept an incoming
> flow file and ExecuteStreamCommand does.  One thing I should mention,
> if your Hive query/statement is going to generate a lot of output (due
> to a long-running MapReduce job, for example), you may want to use the
> --silent command line option to suppress the output.  Otherwise the
> ExecuteProcess and/or ExecuteStreamCommand processor have been known
> to hang on large outputs.
>
> Regards,
> Matt
>
> [1] https://www.cloudera.com/documentation/enterprise/
> release-notes/topics/cdh_vd_cdh_package_tarball_510.html
> [2] http://archive.cloudera.com/cdh5/cdh/5/hive-1.1.0-cdh5.10.
> 0.CHANGES.txt?_ga=1.60219309.1838615776.1489495012
>
>
> On Wed, Mar 22, 2017 at 12:42 PM, Anshuman Ghosh
> <[email protected]> wrote:
> > Hello Matt,
> >
> > Thank you very much for your reply!
> > I guess "ExecuteProcess" should also work with a beeline command?
> > However do you know whether CDH 5.10 is having higher Hive version or
> not?
> >
> > Thanking you in advance!
> >
> >
> > ______________________
> >
> > *Kind Regards,*
> > *Anshuman Ghosh*
> > *Contact - +49 179 9090964*
> >
> >
> > On Wed, Mar 22, 2017 at 4:43 PM, Matt Burgess <[email protected]>
> wrote:
> >
> >> Anshuman,
> >>
> >> The Hive processors use Apache Hive 1.2.0, which is not compatible
> >> with Hive 1.1.0 and is thus a known issue against clusters that use
> >> Hive 1.1.0 such as CDH 5.9.  Unfortunately there were API/code changes
> >> between Hive 1.1.0 and Hive 1.2.0, which means there is no simple
> >> workaround with respect to the Hive processors. The Hive NAR would
> >> have to be rebuilt (and its code changed) to use Hive 1.1.0.
> >>
> >> One possible workaround is to use ExecuteStreamCommand and the
> >> command-line hive client (hive, beeline, etc.) to execute HiveQL
> >> statements. This is not ideal but should work for getting the
> >> statements executed.
> >>
> >> Regards,
> >> Matt
> >>
> >>
> >> On Wed, Mar 22, 2017 at 11:34 AM, Anshuman Ghosh
> >> <[email protected]> wrote:
> >> > Hello everyone,
> >> >
> >> > I am trying to use this "PutHiveQL" processor.
> >> > However no luck with the connection string, seems like I am missing
> out
> >> on
> >> > something.
> >> >
> >> > I am getting an error like "Required field 'client_protocol' is
> unset!"
> >> > Please find the attachments for the error message and also config
> >> property.
> >> >
> >> > BTW, I am using Hive 1.1.0 which is packaged with CDH 5.9. Can that
> be a
> >> > reason?
> >> > What would be the work around?
> >> >
> >> >
> >> > Thank
> >> > ing
> >> > you
> >> > in advance
> >> > !
> >> > ______________________
> >> >
> >> > Kind Regards,
> >> > Anshuman Ghosh
> >> > Contact - +49 179 9090964
> >> >
> >>
>

Reply via email to