Storing data in MySQL from spark hive tables

2015-05-20 Thread roni
Hi ,
I am trying to setup the hive metastore and mysql DB connection.
 I have a spark cluster and I ran some programs and I have data stored in
some hive tables.
Now I want to store this data into Mysql  so that it is available for
further processing.

I setup the hive-site.xml file.

?xml version=1.0?

?xml-stylesheet type=text/xsl href=configuration.xsl?


configuration

  property

namehive.semantic.analyzer.factory.impl/name

valueorg.apache.hcatalog.cli.HCatSemanticAnalyzerFactory/value

  /property


  property

namehive.metastore.sasl.enabled/name

valuefalse/value

  /property


  property

namehive.server2.authentication/name

valueNONE/value

  /property


  property

namehive.server2.enable.doAs/name

valuetrue/value

  /property


  property

namehive.warehouse.subdir.inherit.perms/name

valuetrue/value

  /property


  property

namehive.metastore.schema.verification/name

valuefalse/value

  /property


  property

namejavax.jdo.option.ConnectionURL/name

valuejdbc:mysql://*ip address*
:3306/metastore_db?createDatabaseIfNotExist=true/value

descriptionmetadata is stored in a MySQL server/description

  /property


  property

namejavax.jdo.option.ConnectionDriverName/name

valuecom.mysql.jdbc.Driver/value

descriptionMySQL JDBC driver class/description

  /property


  property

namejavax.jdo.option.ConnectionUserName/name

valueroot/value

  /property


  property

namejavax.jdo.option.ConnectionPassword/name

value/value

  /property

  property

namehive.metastore.warehouse.dir/name

value/user/${user.name}/hive-warehouse/value

descriptionlocation of default database for
the warehouse/description

/property


/configuration
 --
My mysql server is on a separate server than where my spark server is . If
I use mySQLWorkbench , I use a SSH connection  with a certificate file to
connect .
How do I specify all that information from spark to the DB ?
I want to store the data generated by my spark program into mysql.
Thanks
_R


Re: Storing data in MySQL from spark hive tables

2015-05-20 Thread Yana Kadiyska
I'm afraid you misunderstand the purpose of hive-site.xml. It configures
access to the Hive metastore. You can read more here:
http://www.hadoopmaterial.com/2013/11/metastore.html.

So the MySQL DB in hive-site.xml would be used to store hive-specific data
such as schema info, partition info, etc.

Now, for what you want to do, you can search the user list -- I know there
have been posts about Postgres but you can do the same with MySQL. The idea
is to create an object holding a connection pool (so each of your executors
would have its own instance), or alternately, to open a connection within
mapPartitions (so you don't end up with a ton of connections). But the
write to a DB is largely a manual process -- open a connection, create a
statement, sync the data. If your data is small enough you probably could
just collect on the driver and write...though that would certainly be
slower than writing in parallel from each executor.

On Wed, May 20, 2015 at 5:48 PM, roni roni.epi...@gmail.com wrote:

 Hi ,
 I am trying to setup the hive metastore and mysql DB connection.
  I have a spark cluster and I ran some programs and I have data stored in
 some hive tables.
 Now I want to store this data into Mysql  so that it is available for
 further processing.

 I setup the hive-site.xml file.

 ?xml version=1.0?

 ?xml-stylesheet type=text/xsl href=configuration.xsl?


 configuration

   property

 namehive.semantic.analyzer.factory.impl/name

 valueorg.apache.hcatalog.cli.HCatSemanticAnalyzerFactory/value

   /property


   property

 namehive.metastore.sasl.enabled/name

 valuefalse/value

   /property


   property

 namehive.server2.authentication/name

 valueNONE/value

   /property


   property

 namehive.server2.enable.doAs/name

 valuetrue/value

   /property


   property

 namehive.warehouse.subdir.inherit.perms/name

 valuetrue/value

   /property


   property

 namehive.metastore.schema.verification/name

 valuefalse/value

   /property


   property

 namejavax.jdo.option.ConnectionURL/name

 valuejdbc:mysql://*ip address*
 :3306/metastore_db?createDatabaseIfNotExist=true/value

 descriptionmetadata is stored in a MySQL server/description

   /property


   property

 namejavax.jdo.option.ConnectionDriverName/name

 valuecom.mysql.jdbc.Driver/value

 descriptionMySQL JDBC driver class/description

   /property


   property

 namejavax.jdo.option.ConnectionUserName/name

 valueroot/value

   /property


   property

 namejavax.jdo.option.ConnectionPassword/name

 value/value

   /property

   property

 namehive.metastore.warehouse.dir/name

 value/user/${user.name}/hive-warehouse/value

 descriptionlocation of default database for
 the warehouse/description

 /property


 /configuration
  --
 My mysql server is on a separate server than where my spark server is . If
 I use mySQLWorkbench , I use a SSH connection  with a certificate file to
 connect .
 How do I specify all that information from spark to the DB ?
 I want to store the data generated by my spark program into mysql.
 Thanks
 _R