[
https://issues.apache.org/jira/browse/HIVE-17983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16239245#comment-16239245
]
Alan Gates commented on HIVE-17983:
-----------------------------------
I have committed a patch for this to the standalone-metastore branch. There is
much in this patch to comment on.
On the standalone-metastore side, this patch adds several things.
# creation of tarballs for both both source and binary distributions, hopefully
along with the necessary license information;
# install and upgrade scripts for the various RDBMS types (more on this below);
# a version of HiveSchemaTool (called MetastoreSchemaTool) for the standalone
metastore;
# docker files for the various RDBMS types for testing (more on this below);
# shell scripts for running the schema tool and starting the metastore;
# log4j configuration for the metastore server process (this config is not what
we want as it dumps the logfile in the current working directory, this should
be changed to match Hive's behavior of where the file is written).
On the docker files, I have added ones for mysql and postgres. I haven't done
oracle or sqlserver yet. The docker files are generated as part of the build,
but the build does not call {{docker build}}. This is partly because building
the images is time consuming and eats up several G of disk space. But for
oracle there are also license issues that prevent the automatic inclusion of
the docker images. Right now the resulting containers just have the required
RDBMS, Hadoop, and the metastore distribution tarball (unpacked). My goal is
to get to a point where different docker files are created for users to test
against the metastore, and for automated testing of installation and upgrade of
each RDBMS type.
On the installation and upgrade scripts I have only copied the 2.3 and 3.0
installation scripts and 2.3->3.0 upgrade. My assumption is that the
standalone metastore will be used with Hive 3.0 or later, so it doesn't make
sense to copy all the older scripts. I copied in the 2.3 installation scripts
so that we could test the upgrade procedure.
Also on these scripts I have unrolled them so that scripts no longer invoke
other scripts. For example, the 2.3->3.0 upgrade script now includes all the
create table and alter table statement itself rather than calling run on the
various 0XX-HIVE-XXXXX.rdbms.sql scripts. The main reason for this is that
HiveSchemaTool went to a lot of work to do the unrolling on these scripts. As
part of copying HiveSchemaTool and I had to convert it to use SqlLine rather
than Beeline (since the metastore does not have access to beeline) and I did
not want to go through the work of making the unrolling work for SqlLine. And
I saw no advantage to having every DB change in a separate script. Our tools
only support upgrade between versions. I suspect these separate updates are a
holdover from the days when Facebook used to run Hive top of trunk internally
and thus wanted to be able to apply each change discretely.
This patch does not remove the RDBMS scripts from metastore or HiveSchemaTool
from beeline. There are two reasons for this. One, the Hive information
schema depends on HiveSchemaTool to setup a series of tables in Hive via
beeline. The metastore version of SchemaTool can't do this, because it doesn't
have access to beeline.
But the second and much large reason is this brings up the question of how Hive
and the standalone metastore should be installed. Do we completely separate
them out and require users to install the standalone metastore and then Hive?
This is easier for devs but harder on ops and packagers. But it also gives
users maximum flexibility. Or do we modify the Hive build process to pull in
the standalone metastore packages and produce a distribution that includes the
metastore? This is more work for us devs. It gives users a seamless
experience between older and newer versions of Hive. It also matches user
expectations (I can't think of any database that requires you to install its
data catalog as a separate package). On the other hand it locksteps a version
of Hive with a version of the metastore, which may not be what people want. I
don't propose to answer these questions in this JIRA, but I wanted to bring
them up so we can start discussing them.
Leaving copies of the installation and upgrade scripts in metastore and
HiveSchemaTool in beeline that duplicate much code that's also in
standalone-metastore is obviously not a viable long term solution. We will
need some combination of separating things cleanly and refactoring so that a
minimum amount of code is duplicated. But until we answer the questions above
we won't know which way to go so I've left it like this for the moment.
> Make the standalone metastore generate tarballs etc.
> ----------------------------------------------------
>
> Key: HIVE-17983
> URL: https://issues.apache.org/jira/browse/HIVE-17983
> Project: Hive
> Issue Type: Sub-task
> Components: Standalone Metastore
> Reporter: Alan Gates
> Assignee: Alan Gates
> Priority: Major
>
> In order to be separately installable the standalone metastore needs its own
> tarballs, startup scripts, etc. All of the SQL installation and upgrade
> scripts also need to move from metastore to standalone-metastore.
> I also plan to create Dockerfiles for different database types so that
> developers can test the SQL installation and upgrade scripts.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)