[
https://issues.apache.org/jira/browse/SPARK-6666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Luciano Resende closed SPARK-6666.
----------------------------------
Resolution: Cannot Reproduce
I have tried the scenarios above in Spark trunk using both Postgres and DB2,
see:
https://github.com/lresende/spark-sandbox/blob/master/src/main/scala/com/luck/sql/JDBCApplication.scala
And the described issues seems not reproducible anymore, see all results below
root
|-- Symbol: string (nullable = true)
|-- Name: string (nullable = true)
|-- Sector: string (nullable = true)
|-- Price: double (nullable = true)
|-- Dividend Yield: double (nullable = true)
|-- Price/Earnings: double (nullable = true)
|-- Earnings/Share: double (nullable = true)
|-- Book Value: double (nullable = true)
|-- 52 week low: double (nullable = true)
|-- 52 week high: double (nullable = true)
|-- Market Cap: double (nullable = true)
|-- EBITDA: double (nullable = true)
|-- Price/Sales: double (nullable = true)
|-- Price/Book: double (nullable = true)
|-- SEC Filings: string (nullable = true)
+------+------+------+-----+--------------+--------------+--------------+----------+-----------+------------+----------+------+-----------+----------+-----------+
|Symbol| Name|Sector|Price|Dividend Yield|Price/Earnings|Earnings/Share|Book
Value|52 week low|52 week high|Market Cap|EBITDA|Price/Sales|Price/Book|SEC
Filings|
+------+------+------+-----+--------------+--------------+--------------+----------+-----------+------------+----------+------+-----------+----------+-----------+
| S1|Name 1| Sec 1| 10.0| 10.0| 10.0| 10.0|
10.0| 10.0| 10.0| 10.0| 10.0| 10.0| 10.0|
100|
| s2|Name 2| Sec 2| 20.0| 20.0| 20.0| 20.0|
20.0| 20.0| 20.0| 20.0| 20.0| 20.0| 20.0|
200|
+------+------+------+-----+--------------+--------------+--------------+----------+-----------+------------+----------+------+-----------+----------+-----------+
+------+
|AvgCPI|
+------+
| 15.0|
+------+
+------+------+------+-----+--------------+--------------+--------------+----------+-----------+------------+----------+------+-----------+----------+-----------+
|Symbol| Name|Sector|Price|Dividend Yield|Price/Earnings|Earnings/Share|Book
Value|52 week low|52 week high|Market Cap|EBITDA|Price/Sales|Price/Book|SEC
Filings|
+------+------+------+-----+--------------+--------------+--------------+----------+-----------+------------+----------+------+-----------+----------+-----------+
| S1|Name 1| Sec 1| 10.0| 10.0| 10.0| 10.0|
10.0| 10.0| 10.0| 10.0| 10.0| 10.0| 10.0|
100|
| s2|Name 2| Sec 2| 20.0| 20.0| 20.0| 20.0|
20.0| 20.0| 20.0| 20.0| 20.0| 20.0| 20.0|
200|
+------+------+------+-----+--------------+--------------+--------------+----------+-----------+------------+----------+------+-----------+----------+-----------+
> org.apache.spark.sql.jdbc.JDBCRDD does not escape/quote column names
> ---------------------------------------------------------------------
>
> Key: SPARK-6666
> URL: https://issues.apache.org/jira/browse/SPARK-6666
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 1.3.0
> Environment:
> Reporter: John Ferguson
> Priority: Critical
>
> Is there a way to have JDBC DataFrames use quoted/escaped column names?
> Right now, it looks like it "sees" the names correctly in the schema created
> but does not escape them in the SQL it creates when they are not compliant:
> org.apache.spark.sql.jdbc.JDBCRDD
> ....
> private val columnList: String = {
> val sb = new StringBuilder()
> columns.foreach(x => sb.append(",").append(x))
> if (sb.length == 0) "1" else sb.substring(1)
> }
> If you see value in this, I would take a shot at adding the quoting
> (escaping) of column names here. If you don't do it, some drivers... like
> postgresql's will simply drop case all names when parsing the query. As you
> can see in the TL;DR below that means they won't match the schema I am given.
> TL;DR:
> --------
> I am able to connect to a Postgres database in the shell (with driver
> referenced):
> val jdbcDf =
> sqlContext.jdbc("jdbc:postgresql://localhost/sparkdemo?user=dbuser", "sp500")
> In fact when I run:
> jdbcDf.registerTempTable("sp500")
> val avgEPSNamed = sqlContext.sql("SELECT AVG(`Earnings/Share`) as AvgCPI
> FROM sp500")
> and
> val avgEPSProg = jsonDf.agg(avg(jsonDf.col("Earnings/Share")))
> The values come back as expected. However, if I try:
> jdbcDf.show
> Or if I try
>
> val all = sqlContext.sql("SELECT * FROM sp500")
> all.show
> I get errors about column names not being found. In fact the error includes
> a mention of column names all lower cased. For now I will change my schema
> to be more restrictive. Right now it is, per a Stack Overflow poster, not
> ANSI compliant by doing things that are allowed by ""'s in pgsql, MySQL and
> SQLServer. BTW, our users are giving us tables like this... because various
> tools they already use support non-compliant names. In fact, this is mild
> compared to what we've had to support.
> Currently the schema in question uses mixed case, quoted names with special
> characters and spaces:
> CREATE TABLE sp500
> (
> "Symbol" text,
> "Name" text,
> "Sector" text,
> "Price" double precision,
> "Dividend Yield" double precision,
> "Price/Earnings" double precision,
> "Earnings/Share" double precision,
> "Book Value" double precision,
> "52 week low" double precision,
> "52 week high" double precision,
> "Market Cap" double precision,
> "EBITDA" double precision,
> "Price/Sales" double precision,
> "Price/Book" double precision,
> "SEC Filings" text
> )
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]