I'm afraid you misunderstand the purpose of hive-site.xml. It configures
access to the Hive metastore. You can read more here:
http://www.hadoopmaterial.com/2013/11/metastore.html.
So the MySQL DB in hive-site.xml would be used to store hive-specific data
such as schema info, partition info, etc.
Now, for what you want to do, you can search the user list -- I know there
have been posts about Postgres but you can do the same with MySQL. The idea
is to create an object holding a connection pool (so each of your executors
would have its own instance), or alternately, to open a connection within
mapPartitions (so you don't end up with a ton of connections). But the
write to a DB is largely a manual process -- open a connection, create a
statement, sync the data. If your data is small enough you probably could
just collect on the driver and write...though that would certainly be
slower than writing in parallel from each executor.
On Wed, May 20, 2015 at 5:48 PM, roni roni.epi...@gmail.com wrote:
Hi ,
I am trying to setup the hive metastore and mysql DB connection.
I have a spark cluster and I ran some programs and I have data stored in
some hive tables.
Now I want to store this data into Mysql so that it is available for
further processing.
I setup the hive-site.xml file.
?xml version=1.0?
?xml-stylesheet type=text/xsl href=configuration.xsl?
configuration
property
namehive.semantic.analyzer.factory.impl/name
valueorg.apache.hcatalog.cli.HCatSemanticAnalyzerFactory/value
/property
property
namehive.metastore.sasl.enabled/name
valuefalse/value
/property
property
namehive.server2.authentication/name
valueNONE/value
/property
property
namehive.server2.enable.doAs/name
valuetrue/value
/property
property
namehive.warehouse.subdir.inherit.perms/name
valuetrue/value
/property
property
namehive.metastore.schema.verification/name
valuefalse/value
/property
property
namejavax.jdo.option.ConnectionURL/name
valuejdbc:mysql://*ip address*
:3306/metastore_db?createDatabaseIfNotExist=true/value
descriptionmetadata is stored in a MySQL server/description
/property
property
namejavax.jdo.option.ConnectionDriverName/name
valuecom.mysql.jdbc.Driver/value
descriptionMySQL JDBC driver class/description
/property
property
namejavax.jdo.option.ConnectionUserName/name
valueroot/value
/property
property
namejavax.jdo.option.ConnectionPassword/name
value/value
/property
property
namehive.metastore.warehouse.dir/name
value/user/${user.name}/hive-warehouse/value
descriptionlocation of default database for
the warehouse/description
/property
/configuration
--
My mysql server is on a separate server than where my spark server is . If
I use mySQLWorkbench , I use a SSH connection with a certificate file to
connect .
How do I specify all that information from spark to the DB ?
I want to store the data generated by my spark program into mysql.
Thanks
_R