Teng Qiu created SPARK-2710:
-------------------------------
Summary: Build SchemaRDD from a JdbcRDD with MetaData (no hard
code case class)
Key: SPARK-2710
URL: https://issues.apache.org/jira/browse/SPARK-2710
Project: Spark
Issue Type: Improvement
Components: Spark Core, SQL
Reporter: Teng Qiu
Spark SQL can take Parquet files or JSON files as a table directly (without
given a case class to define the schema)
as a component named SQL, it should also be able to take a ResultSet from RDBMS
easily.
i find that there is a JdbcRDD in core:
core/src/main/scala/org/apache/spark/rdd/JdbcRDD.scala
so i want to make some small change in this file to allow SQLContext to read
the MetaData from the PreparedStatement (read metadata do not need to execute
the query really).
and there is a small bug in JdbcRDD
in compute(), method close()
{code}
if (null != conn && ! stmt.isClosed()) conn.close()
{code}
should be
{code}
if (null != conn && ! conn.isClosed()) conn.close()
{code}
just a small write error :)
Then, in Spark SQL, SQLContext can create SchemaRDD with JdbcRDD and his
MetaData.
In the further, maybe we can add a feature in sql-shell, so that user can using
spark-thrift-server join tables from different sources
such as:
{code}
CREATE TABLE jdbc_tbl1 AS JDBC "connectionString" "username" "password"
"initQuery" "bound" ...
CREATE TABLE parquet_files AS JDBC "hdfs://tmp/parquet_table/"
SELECT parquet_files.colX, jdbc_tbl1.colY
FROM parquet_files
JOIN jdbc_tbl1
ON (parquet_files.id = jdbc_tbl1.id)
{code}
I think such a feature will be useful, like facebook Presto engine does.
--
This message was sent by Atlassian JIRA
(v6.2#6252)