[
https://issues.apache.org/jira/browse/HIVE-24230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Zoltan Haindrich resolved HIVE-24230.
-------------------------------------
Fix Version/s: 4.0.0
Resolution: Fixed
merged into master. Thank you [~amagyar]!
> Integrate HPL/SQL into HiveServer2
> ----------------------------------
>
> Key: HIVE-24230
> URL: https://issues.apache.org/jira/browse/HIVE-24230
> Project: Hive
> Issue Type: Sub-task
> Components: HiveServer2, hpl/sql
> Reporter: Attila Magyar
> Assignee: Attila Magyar
> Priority: Major
> Labels: pull-request-available
> Fix For: 4.0.0
>
> Time Spent: 3h
> Remaining Estimate: 0h
>
> HPL/SQL is a standalone command line program that can store and load scripts
> from text files, or from Hive Metastore (since HIVE-24217). Currently HPL/SQL
> depends on Hive and not the other way around.
> Changing the dependency order between HPL/SQL and HiveServer would open up
> some possibilities which are currently not feasable to implement. For example
> one might want to use a third party SQL tool to run selects on stored
> procedure (or rather function in this case) outputs.
> {code:java}
> SELECT * from myStoredProcedure(1, 2); {code}
> HPL/SQL doesn’t have a JDBC interface and it’s not a daemon so this would not
> work with the current architecture.
> Another important factor is performance. Declarative SQL commands are sent to
> Hive via JDBC by HPL/SQL. The integration would make it possible to drop JDBC
> and use HiveSever’s internal API for compilation and execution.
> The third factor is that existing tools like Beeline or Hue cannot be used
> with HPL/SQL since it has its own, separated CLI.
>
> To make it easier to implement, we keep things separated in the inside at
> first, by introducing a hive session level JDBC parameter.
> {code:java}
> jdbc:hive2://localhost:10000/default;hplsqlMode=true {code}
>
> The hplsqlMode indicates that we are in procedural SQL mode where the user
> can create and call stored procedures. HPLSQL allows you to write any kind of
> procedural statement at the top level. This patch doesn't limit this but it
> might be better to eventually restrict what statements are allowed outside of
> stored procedures.
>
> Since HPLSQL and Hive are running in the same process there is no need to use
> the JDBC driver between them. The patch adds an abstraction with 2 different
> implementations, one for executing queries on JDBC (for keeping the existing
> behaviour) and another one for directly calling Hive's compiler. In HPLSQL
> mode the latter is used.
> In the inside a new operation (HplSqlOperation) and operation type
> (PROCEDURAL_SQL) was added which works similar to the SQLOperation but it
> uses the hplsql interpreter to execute arbitrary scripts. This operation
> might spawns new SQLOpertions.
> For example consider the following statement:
> {code:java}
> FOR i in 1..10 LOOP
> SELECT * FROM table
> END LOOP;{code}
> We send this to beeline while we'er in hplsql mode. Hive will create a hplsql
> interpreter and store it in the session state. A new HplSqlOperation is
> created to run the script on the interpreter.
> HPLSQL knows how to execute the for loop, but i'll call Hive to run the
> select expression. The HplSqlOperation is notified when the select reads a
> row and accumulates the rows into a RowSet (memory consumption need to be
> considered here) which can be retrieved via thrift from the client side.
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)