[jira] [Closed] (SPARK-6666) org.apache.spark.sql.jdbc.JDBCRDD does not escape/quote column names

Luciano Resende (JIRA) Mon, 07 Mar 2016 16:26:18 -0800

     [ 
https://issues.apache.org/jira/browse/SPARK-6666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Luciano Resende closed SPARK-6666.
----------------------------------
    Resolution: Cannot Reproduce

I have tried the scenarios above in Spark trunk using both Postgres and DB2, 
see:
https://github.com/lresende/spark-sandbox/blob/master/src/main/scala/com/luck/sql/JDBCApplication.scala

And the described issues seems not reproducible anymore, see all results below

root
 |-- Symbol: string (nullable = true)
 |-- Name: string (nullable = true)
 |-- Sector: string (nullable = true)
 |-- Price: double (nullable = true)
 |-- Dividend Yield: double (nullable = true)
 |-- Price/Earnings: double (nullable = true)
 |-- Earnings/Share: double (nullable = true)
 |-- Book Value: double (nullable = true)
 |-- 52 week low: double (nullable = true)
 |-- 52 week high: double (nullable = true)
 |-- Market Cap: double (nullable = true)
 |-- EBITDA: double (nullable = true)
 |-- Price/Sales: double (nullable = true)
 |-- Price/Book: double (nullable = true)
 |-- SEC Filings: string (nullable = true)

+------+------+------+-----+--------------+--------------+--------------+----------+-----------+------------+----------+------+-----------+----------+-----------+
|Symbol|  Name|Sector|Price|Dividend Yield|Price/Earnings|Earnings/Share|Book 
Value|52 week low|52 week high|Market Cap|EBITDA|Price/Sales|Price/Book|SEC 
Filings|
+------+------+------+-----+--------------+--------------+--------------+----------+-----------+------------+----------+------+-----------+----------+-----------+
|    S1|Name 1| Sec 1| 10.0|          10.0|          10.0|          10.0|      
10.0|       10.0|        10.0|      10.0|  10.0|       10.0|      10.0|        
100|
|    s2|Name 2| Sec 2| 20.0|          20.0|          20.0|          20.0|      
20.0|       20.0|        20.0|      20.0|  20.0|       20.0|      20.0|        
200|
+------+------+------+-----+--------------+--------------+--------------+----------+-----------+------------+----------+------+-----------+----------+-----------+

+------+
|AvgCPI|
+------+
|  15.0|
+------+

+------+------+------+-----+--------------+--------------+--------------+----------+-----------+------------+----------+------+-----------+----------+-----------+
|Symbol|  Name|Sector|Price|Dividend Yield|Price/Earnings|Earnings/Share|Book 
Value|52 week low|52 week high|Market Cap|EBITDA|Price/Sales|Price/Book|SEC 
Filings|
+------+------+------+-----+--------------+--------------+--------------+----------+-----------+------------+----------+------+-----------+----------+-----------+
|    S1|Name 1| Sec 1| 10.0|          10.0|          10.0|          10.0|      
10.0|       10.0|        10.0|      10.0|  10.0|       10.0|      10.0|        
100|
|    s2|Name 2| Sec 2| 20.0|          20.0|          20.0|          20.0|      
20.0|       20.0|        20.0|      20.0|  20.0|       20.0|      20.0|        
200|
+------+------+------+-----+--------------+--------------+--------------+----------+-----------+------------+----------+------+-----------+----------+-----------+



> org.apache.spark.sql.jdbc.JDBCRDD  does not escape/quote column names
> ---------------------------------------------------------------------
>
>                 Key: SPARK-6666
>                 URL: https://issues.apache.org/jira/browse/SPARK-6666
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 1.3.0
>         Environment:  
>            Reporter: John Ferguson
>            Priority: Critical
>
> Is there a way to have JDBC DataFrames use quoted/escaped column names?  
> Right now, it looks like it "sees" the names correctly in the schema created 
> but does not escape them in the SQL it creates when they are not compliant:
> org.apache.spark.sql.jdbc.JDBCRDD
> ....
> private val columnList: String = {
> val sb = new StringBuilder()
> columns.foreach(x => sb.append(",").append(x))
> if (sb.length == 0) "1" else sb.substring(1)
> }
> If you see value in this, I would take a shot at adding the quoting 
> (escaping) of column names here.  If you don't do it, some drivers... like 
> postgresql's will simply drop case all names when parsing the query.  As you 
> can see in the TL;DR below that means they won't match the schema I am given.
> TL;DR:
> -------- 
> I am able to connect to a Postgres database in the shell (with driver 
> referenced):
>    val jdbcDf = 
> sqlContext.jdbc("jdbc:postgresql://localhost/sparkdemo?user=dbuser", "sp500")
> In fact when I run:
>    jdbcDf.registerTempTable("sp500")
>    val avgEPSNamed = sqlContext.sql("SELECT AVG(`Earnings/Share`) as AvgCPI 
> FROM sp500")
> and
>    val avgEPSProg = jsonDf.agg(avg(jsonDf.col("Earnings/Share")))
> The values come back as expected.  However, if I try:
>    jdbcDf.show
> Or if I try
>    
>    val all = sqlContext.sql("SELECT * FROM sp500")
>    all.show
> I get errors about column names not being found.  In fact the error includes 
> a mention of column names all lower cased.  For now I will change my schema 
> to be more restrictive.  Right now it is, per a Stack Overflow poster, not 
> ANSI compliant by doing things that are allowed by ""'s in pgsql, MySQL and 
> SQLServer.  BTW, our users are giving us tables like this... because various 
> tools they already use support non-compliant names.  In fact, this is mild 
> compared to what we've had to support.
> Currently the schema in question uses mixed case, quoted names with special 
> characters and spaces:
> CREATE TABLE sp500
> (
> "Symbol" text,
> "Name" text,
> "Sector" text,
> "Price" double precision,
> "Dividend Yield" double precision,
> "Price/Earnings" double precision,
> "Earnings/Share" double precision,
> "Book Value" double precision,
> "52 week low" double precision,
> "52 week high" double precision,
> "Market Cap" double precision,
> "EBITDA" double precision,
> "Price/Sales" double precision,
> "Price/Book" double precision,
> "SEC Filings" text
> ) 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Closed] (SPARK-6666) org.apache.spark.sql.jdbc.JDBCRDD does not escape/quote column names

Reply via email to