[jira] [Comment Edited] (S2GRAPH-75) Use an embedded database as the default metadata storage

Jong Wook Kim (JIRA) Tue, 07 Jun 2016 01:13:05 -0700

    [ 
https://issues.apache.org/jira/browse/S2GRAPH-75?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15317429#comment-15317429
 ]


Jong Wook Kim edited comment on S2GRAPH-75 at 6/7/16 8:12 AM:
--------------------------------------------------------------

I concluded that H2 is the best choice for the default metastore to be used by 
the package for distribution, because

# H2 provides MySQL compatibility mode, with which we can continue using the 
MySQL-flavored SQL without significant modification
# Sqlite JDBC driver is not actually small (>5MB), because it has to contain 
native libraries for all supported platforms, whereas H2 is a pure Java library 
(~1MB)

[In this 
commit|https://github.com/jongwook/incubator-s2graph/commit/56e684784d520245d0cc31758bcab816cf318936]
 I have added a very simple migration tool that creates tables if there is no 
table at all in given database. In order for this, I did the followings:

# separated {{schema.sql}} into {{setup.sql}} which has database and user setup 
for MySQL only, and {{schema.sql}} which has the actual table schema which can 
be used for both MySQL and H2.
# modified {{schema.sql}} a little bit, to make H2 understand the queries. 
Specifically,
#* made index and key names to be unique, by prefixing them with the 
corresponding table's name.
#* replaced the 75-byte sub_part index for {{services.cluster}} with the 
default index as H2 doesn't support it. -- why were we doing this sub_part 
indexing?
# moved the files from {{s2core/migrate/mysql}} to 
{{s2core/src/main/resources/org/apache/s2graph/core/mysqls/}}, in order to be 
able to load {{schema.sql}} in runtime, using 
{{java.lang.Class.getResourceAsStream}}.
# added {{org.apache.s2graph.core.mysqls.Model.checkSchema}} which is called 
during the initialization; this method will create the tables if there isn't 
already.

The resulting tar.gz package is 88MB, which you can just download, extract and 
run {{bin/start-s2graph.sh}} to make the whole thing running. I have tested the 
s2graph server working without any prior setup for HBase and MySQL, in OSX 
10.11.4 and Linux Mint 17.2 which is basically Ubuntu 14.04.3 LTS. 

This should conclude all three subtasks of S2GRAPH-70. While logging and 
configuration can be improved, it can be discussed in a separate issue. I'll go 
ahead and create a pull request for under S2GRAPH-70, and please start the 
review  there.


was (Author: jongwook):
I concluded that H2 is the best choice for the default metastore to be used by 
the package for distribution, because

# H2 provides MySQL compatibility mode, with which we can continue using the 
MySQL-flavored SQL without significant modification
# Sqlite JDBC driver is not actually small (>5MB), because it has to contain 
native libraries for all supported platforms, whereas H2 is a native library 
(~1MB)

[In this 
commit|https://github.com/jongwook/incubator-s2graph/commit/56e684784d520245d0cc31758bcab816cf318936]
 I have added a very simple migration tool that creates tables if there is no 
table at all in given database. In order for this, I did the followings:

# separated {{schema.sql}} into {{setup.sql}} which has database and user setup 
for MySQL only, and {{schema.sql}} which has the actual table schema which can 
be used for both MySQL and H2.
# modified {{schema.sql}} a little bit, to make H2 understand the queries. 
Specifically,
#* made index and key names to be unique, by prefixing them with the 
corresponding table's name.
#* replaced the 75-byte sub_part index for {{services.cluster}} with the 
default index as H2 doesn't support it. -- why were we doing this sub_part 
indexing?
# moved the files from {{s2core/migrate/mysql}} to 
{{s2core/src/main/resources/org/apache/s2graph/core/mysqls/}}, in order to be 
able to load {{schema.sql}} in runtime, using 
{{java.lang.Class.getResourceAsStream}}.
# added {{org.apache.s2graph.core.mysqls.Model.checkSchema}} which is called 
during the initialization; this method will create the tables if there isn't 
already.

The resulting tar.gz package is 88MB, which you can just download, extract and 
run {{bin/start-s2graph.sh}} to make the whole thing running. I have tested the 
s2graph server working without any prior setup for HBase and MySQL, in OSX 
10.11.4 and Linux Mint 17.2 which is basically Ubuntu 14.04.3 LTS. 

This should conclude all three subtasks of S2GRAPH-70. While logging and 
configuration can be improved, it can be discussed in a separate issue. I'll go 
ahead and create a pull request for under S2GRAPH-70, and please start the 
review  there.

> Use an embedded database as the default metadata storage
> --------------------------------------------------------
>
>                 Key: S2GRAPH-75
>                 URL: https://issues.apache.org/jira/browse/S2GRAPH-75
>             Project: S2Graph
>          Issue Type: Sub-task
>            Reporter: Jong Wook Kim
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> The current choice of the metadata storage - MySQL - served well for the 
> production usage, but running the DBMS, creating database and tables, 
> configuring the JDBC, etc. has not been transparent to the users, and most 
> importantly, is not documented anywhere.
> In order for the users to be able to just download and run s2graph, it is 
> desirable to use the metadata storage without setting up a separate MySQL 
> server, at least for the first run. We should then recommend MySQL or similar 
> for production usage.
> Derby, H2 and HSQL are popular choices for an embedded database in JVM, and 
> we should figure out the most appropriate choice for us.
> As a side note, currently the only way to get the schema is to hack on the 
> Vagrant image. This should also be made transparent and manageable somehow. 
> For example, [Hive metastore's schema is being managed using a schema and 
> upgrade 
> SQLs|https://github.com/apache/hive/tree/master/metastore/scripts/upgrade/mysql]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (S2GRAPH-75) Use an embedded database as the default metadata storage

Reply via email to