[jira] [Commented] (SPARK-15987) PostgreSQL CITEXT type JDBC support

Sergey Bahchissaraitsev (JIRA) Tue, 21 Jun 2016 00:33:13 -0700

    [ 
https://issues.apache.org/jira/browse/SPARK-15987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15341294#comment-15341294
 ]


Sergey Bahchissaraitsev commented on SPARK-15987:
-------------------------------------------------

Casting could be a work around, I tried creating a view with casting the type 
citext into regular varchar and it worked. Although, I don't think that it 
should be they way to go.

The 1111 type in the error message indicates it's being treated as OTHER and 
spark automatically throws an exception in that case:
https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCRDD.scala#L89

As this might be ok for default behavior, there probably could be an option to 
specify how to treat the OTHER type.

In this case, the user could specify StringType as Takeshi suggested, but in 
other cases with other postgres (or maybe even not postgres) extensions, we 
could as well use BinaryType or any other type the user of the application sees 
fit.

Could this work?

Thanks.

> PostgreSQL CITEXT type JDBC support
> -----------------------------------
>
>                 Key: SPARK-15987
>                 URL: https://issues.apache.org/jira/browse/SPARK-15987
>             Project: Spark
>          Issue Type: New Feature
>          Components: SQL
>    Affects Versions: 1.6.1
>         Environment: Ubuntu 14.04
> PostgreSQL 9.3.9
>            Reporter: Sergey Bahchissaraitsev
>              Labels: dataframe, jdbc, postgresql
>
> When trying to use spark data frame on  a table with CITEXT type you get the 
> following error:
> Exception in thread "main" java.sql.SQLException: Unsupported type 1111
>       at 
> org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$.org$apache$spark$sql$execution$datasources$jdbc$JDBCRDD$$getCatalystType(JDBCRDD.scala:102)
>       at 
> org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$$anonfun$1.apply(JDBCRDD.scala:141)
>       at 
> org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$$anonfun$1.apply(JDBCRDD.scala:141)
>       at scala.Option.getOrElse(Option.scala:120)
>       at 
> org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$.resolveTable(JDBCRDD.scala:140)
>       at 
> org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation.<init>(JDBCRelation.scala:91)
>       at org.apache.spark.sql.DataFrameReader.jdbc(DataFrameReader.scala:222)
>       at org.apache.spark.sql.DataFrameReader.jdbc(DataFrameReader.scala:208)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SPARK-15987) PostgreSQL CITEXT type JDBC support

Reply via email to