[jira] [Commented] (HIVE-16771) Schematool should use MetastoreSchemaInfo to get the metastore schema version from database

Vihang Karajgaonkar (JIRA) Sat, 27 May 2017 12:15:14 -0700

    [ 
https://issues.apache.org/jira/browse/HIVE-16771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16027546#comment-16027546
 ]


Vihang Karajgaonkar commented on HIVE-16771:
--------------------------------------------

The test failure {{udtf_replicate_rows}} is unrelated and working for me when I 
run it locally. I ran it twice and it succeeded both the times. I am attaching 
one more version with the changes described below. I am hoping that the next 
run will succeed for that test.

The new version of the patch closes the connection object from 
getMetastoreSchemaVersion method implementation.

Hi [~ngangam] I agree that the interface method should ideally just look like 
{{getMetaStoreSchemaVersion()}}. I looked into that possibility but it seems 
like in order to achieve that it may need a major refactoring. I think in 
general HiveSchemaTool can be made lot more generic which will enable such 
seamless plug-and-play design. In order to do that I propose to do following 
enhancements to it.

1. I think HiveSchemaTool is in the BeeLine module currently only because it 
uses BeeLine to run the queries on metastore. Ideally I think it makes sense to 
move HiveSchemaTool to metastore module in the package 
{{org.apache.hadoop.hive.metastore.tools}}. How it runs the queries should be 
left to the implementations of the interface. If we move it to metastore 
package we can potentially just use JDOQL and datanucleus to query the database 
like what MetaTool does.
2. In order to do the above we need to make it generic enough so that any 
database client should be able to plugged into it to retrieve the results. So 
it should only interact with these implementations through an interface 
(IMetaStoreSchemaInfo) which should also be in the same package as 
HiveSchemaTool.
3. The implementations of the interface however could be user-defined. In case 
of Hive we already have the default implementation using BeeLine which we could 
keep it in the BeeLine module.
4. Once we do all the above, I think the interface will look a lot more cleaner 
as well as the design.

What do you think about these proposals? We can take it up in a separate JIRA 
if you think these make sense.

For now, I think the attached patch is reasonably generic enough given that 
there lot of cross dependencies between the HiveSchemaTool, BeeLine and 
metastore. Can you please review and let me know what you think? Thanks!

> Schematool should use MetastoreSchemaInfo to get the metastore schema version 
> from database
> -------------------------------------------------------------------------------------------
>
>                 Key: HIVE-16771
>                 URL: https://issues.apache.org/jira/browse/HIVE-16771
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Vihang Karajgaonkar
>            Assignee: Vihang Karajgaonkar
>            Priority: Minor
>         Attachments: HIVE-16771.01.patch, HIVE-16771.02.patch, 
> HIVE-16771.03.patch
>
>
> HIVE-16723 gives the ability to have a custom MetastoreSchemaInfo 
> implementation to manage schema upgrades and initialization if needed. In 
> order to make HiveSchemaTool completely agnostic it should depend on 
> IMetastoreSchemaInfo implementation which is configured to get the metastore 
> schema version information from the database. It should also not assume the 
> scripts directory and hardcode it itself. It would rather ask 
> MetastoreSchemaInfo class to get the metastore scripts directory.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HIVE-16771) Schematool should use MetastoreSchemaInfo to get the metastore schema version from database

Reply via email to