Teng Qiu created SPARK-2710:
-------------------------------

             Summary: Build SchemaRDD from a JdbcRDD with MetaData (no hard 
code case class)
                 Key: SPARK-2710
                 URL: https://issues.apache.org/jira/browse/SPARK-2710
             Project: Spark
          Issue Type: Improvement
          Components: Spark Core, SQL
            Reporter: Teng Qiu


Spark SQL can take Parquet files or JSON files as a table directly (without 
given a case class to define the schema)

as a component named SQL, it should also be able to take a ResultSet from RDBMS 
easily.

i find that there is a JdbcRDD in core: 
core/src/main/scala/org/apache/spark/rdd/JdbcRDD.scala

so i want to make some small change in this file to allow SQLContext to read 
the MetaData from the PreparedStatement (read metadata do not need to execute 
the query really).

and there is a small bug in JdbcRDD

in compute(), method close()
{code}
if (null != conn && ! stmt.isClosed()) conn.close()
{code}
should be
{code}
if (null != conn && ! conn.isClosed()) conn.close()
{code}

just a small write error :)

Then, in Spark SQL, SQLContext can create SchemaRDD with JdbcRDD and his 
MetaData.

In the further, maybe we can add a feature in sql-shell, so that user can using 
spark-thrift-server join tables from different sources

such as:
{code}
CREATE TABLE jdbc_tbl1 AS JDBC "connectionString" "username" "password" 
"initQuery" "bound" ...
CREATE TABLE parquet_files AS JDBC "hdfs://tmp/parquet_table/"
SELECT parquet_files.colX, jdbc_tbl1.colY
  FROM parquet_files
  JOIN jdbc_tbl1
    ON (parquet_files.id = jdbc_tbl1.id)
{code}

I think such a feature will be useful, like facebook Presto engine does.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to