[ 
https://issues.apache.org/jira/browse/HIVE-17983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16239245#comment-16239245
 ] 

Alan Gates commented on HIVE-17983:
-----------------------------------

I have committed a patch for this to the standalone-metastore branch.  There is 
much in this patch to comment on.

On the standalone-metastore side, this patch adds several things.  
# creation of tarballs for both both source and binary distributions, hopefully 
along with the necessary license information;
# install and upgrade scripts for the various RDBMS types (more on this below);
# a version of HiveSchemaTool (called MetastoreSchemaTool) for the standalone 
metastore;
# docker files for the various RDBMS types for testing (more on this below);
# shell scripts for running the schema tool and starting the metastore;
# log4j configuration for the metastore server process (this config is not what 
we want as it dumps the logfile in the current working directory, this should 
be changed to match Hive's behavior of where the file is written).

On the docker files, I have added ones for mysql and postgres.  I haven't done 
oracle or sqlserver yet.  The docker files are generated as part of the build, 
but the build does not call {{docker build}}.  This is partly because building 
the images is time consuming and eats up several G of disk space.  But for 
oracle there are also license issues that prevent the automatic inclusion of 
the docker images.  Right now the resulting containers just have the required 
RDBMS, Hadoop, and the metastore distribution tarball (unpacked).  My goal is 
to get to a point where different docker files are created for users to test 
against the metastore, and for automated testing of installation and upgrade of 
each RDBMS type.

On the installation and upgrade scripts I have only copied the 2.3 and 3.0 
installation scripts and 2.3->3.0 upgrade.  My assumption is that the 
standalone metastore will be used with Hive 3.0 or later, so it doesn't make 
sense to copy all the older scripts.  I copied in the 2.3 installation scripts 
so that we could test the upgrade procedure.  

Also on these scripts I have unrolled them so that scripts no longer invoke 
other scripts.  For example, the 2.3->3.0 upgrade script now includes all the 
create table and alter table statement itself rather than calling run on the 
various 0XX-HIVE-XXXXX.rdbms.sql scripts.  The main reason for this is that 
HiveSchemaTool went to a lot of work to do the unrolling on these scripts.  As 
part of copying HiveSchemaTool and I had to convert it to use SqlLine rather 
than Beeline (since the metastore does not have access to beeline) and I did 
not want to go through the work of making the unrolling work for SqlLine.  And 
I saw no advantage to having every DB change in a separate script.  Our tools 
only support upgrade between versions.  I suspect these separate updates are a 
holdover from the days when Facebook used to run Hive top of trunk internally 
and thus wanted to be able to apply each change discretely.

This patch does not remove the RDBMS scripts from metastore or HiveSchemaTool 
from beeline.  There are two reasons for this.  One, the Hive information 
schema depends on HiveSchemaTool to setup a series of tables in Hive via 
beeline.  The metastore version of SchemaTool can't do this, because it doesn't 
have access to beeline.

But the second and much large reason is this brings up the question of how Hive 
and the standalone metastore should be installed.  Do we completely separate 
them out and require users to install the standalone metastore and then Hive?  
This is easier for devs but harder on ops and packagers.  But it also gives 
users maximum flexibility.  Or do we modify the Hive build process to pull in 
the standalone metastore packages and produce a distribution that includes the 
metastore?  This is more work for us devs.  It gives users a seamless 
experience between older and newer versions of Hive.  It also matches user 
expectations (I can't think of any database that requires you to install its 
data catalog as a separate package).  On the other hand it locksteps a version 
of Hive with a version of the metastore, which may not be what people want.  I 
don't propose to answer these questions in this JIRA, but I wanted to bring 
them up so we can start discussing them.

Leaving copies of the installation and upgrade scripts in metastore and 
HiveSchemaTool in beeline that duplicate much code that's also in 
standalone-metastore is obviously not a viable long term solution.  We will 
need some combination of separating things cleanly and refactoring so that a 
minimum amount of code is duplicated.  But until we answer the questions above 
we won't know which way to go so I've left it like this for the moment.

> Make the standalone metastore generate tarballs etc.
> ----------------------------------------------------
>
>                 Key: HIVE-17983
>                 URL: https://issues.apache.org/jira/browse/HIVE-17983
>             Project: Hive
>          Issue Type: Sub-task
>          Components: Standalone Metastore
>            Reporter: Alan Gates
>            Assignee: Alan Gates
>            Priority: Major
>
> In order to be separately installable the standalone metastore needs its own 
> tarballs, startup scripts, etc.  All of the SQL installation and upgrade 
> scripts also need to move from metastore to standalone-metastore.
> I also plan to create Dockerfiles for different database types so that 
> developers can test the SQL installation and upgrade scripts.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to