[jira] [Comment Edited] (SPARK-30741) The data returned from SAS using JDBC reader contains column label

Gary Liu (Jira) Mon, 10 Feb 2020 06:27:59 -0800


    [ 
https://issues.apache.org/jira/browse/SPARK-30741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17033632#comment-17033632
 ]


Gary Liu edited comment on SPARK-30741 at 2/10/20 2:26 PM:
-----------------------------------------------------------

Thanks Hyukjin Kwon!

We tried 2.1.x, 2.3.x and 2.4.x, all got the same results. 

I am a data analyst, not quite sure about detailed settings, but will try my 
best. I will ask our IT experts to provide additional information if possible. 

We have an in-house SAS Grid server and Spark Clusters. We run pyspark code in 
Zeppelin. There is a firewall between Zep server and SAS Grid, but we have 
opened the route/port. 

Based on [SAS JDBC 
Cookbook|[https://documentation.sas.com/?docsetId=jdbcref&docsetTarget=titlepage.htm&docsetVersion=9.4&locale=en]],
 there are 2 ways to connect with SAS using JDBC, i.e. SAS/SHARE and IOM. We 
tested SAS/SHARE. The testing code is as shown below (you can also find in 
attached screenshot). The returned spark dataframe is all populated with SAS 
column labels. We reported this to SAS Support, they said the data was 
correctly returned when using JAVA to call SAS JDBC driver, so they think that 
is due to Spark JDBC reader. The SAS JDBC driver worked OK with KNIME or 
desktop JDBC SQL clients, such as Squirrel SQL/DBeaver (although it does return 
column labels as column name in the results), where we can see correct data 
returned. But in spark, the results is not usable. We are wondering if this is 
due to spark JDBC reader or any other reasons, such as firewall setting......

 Testing code:   
{code:java}
ja = spark.read.jdbc("jdbc:sharenet://server_path:port?librefs=MyLib 
'/sas/server/path'"
 ,'''(
      select 
         cust_id as cust label 'customer_ID'
         , 'test' as col label 'label_name'
      from 
         MyLib.test_data(obs=10)
     ) t'''
 , properties={"driver":"com.sas.net.sharenet.ShareNetDriver"})

ja.show(10, False){code}
Results: 
{code:java}
+-----------+----------+
|customer_ID|label_name|
+-----------+----------+
|customer_ID|label_name|
|customer_ID|label_name|
|customer_ID|label_name|
|customer_ID|label_name|
|customer_ID|label_name|
|customer_ID|label_name|
|customer_ID|label_name|
|customer_ID|label_name|
|customer_ID|label_name|
|customer_ID|label_name|
+-----------+----------+{code}


was (Author: garyliutelus):
Thanks Hyukjin Kwon!

We tried 2.1.x, 2.3.x and 2.4.x, all got the same results. 

I am a data analyst, not quite sure about detailed settings, but will try my 
best. I will ask our IT experts to provide additional information if possible. 

We have an in-house SAS Grid server and Spark Clusters. We run pyspark code in 
Zeppelin. There is a firewall between Zep server and SAS Grid, but we have 
opened the route/port. 

Based on SAS JDBC Cookbook, there are 2 ways to connect with SAS using JDBC, 
i.e. SAS/SHARE and IOM. We tested SAS/SHARE. The testing code is as shown below 
(you can also find in attached screenshot). The returned spark dataframe is all 
populated with SAS column labels. We reported this to SAS Support, they said 
the data was correctly returned when using JAVA to call SAS JDBC driver, so 
they think that is due to Spark JDBC reader. The SAS JDBC driver worked OK with 
KNIME or desktop JDBC SQL clients, such as Squirrel SQL/DBeaver (although it 
does return column labels as column name in the results), where we can see 
correct data returned. But in spark, the results is not usable. We are 
wondering if this is due to spark JDBC reader or any other reasons, such as 
firewall setting......

 

Testing code: 

 

 
{code:java}
ja = spark.read.jdbc("jdbc:sharenet://server_path:port?librefs=MyLib 
'/sas/server/path'"
 ,'''(
      select 
         cust_id as cust label 'customer_ID'
         , 'test' as col label 'label_name'
      from 
         MyLib.test_data(obs=10)
     ) t'''
 , properties={"driver":"com.sas.net.sharenet.ShareNetDriver"})

ja.show(10, False){code}
 

Results: 
{code:java}
+-----------+----------+
|customer_ID|label_name|
+-----------+----------+
|customer_ID|label_name|
|customer_ID|label_name|
|customer_ID|label_name|
|customer_ID|label_name|
|customer_ID|label_name|
|customer_ID|label_name|
|customer_ID|label_name|
|customer_ID|label_name|
|customer_ID|label_name|
|customer_ID|label_name|
+-----------+----------+{code}

> The data returned from SAS using JDBC reader contains column label
> ------------------------------------------------------------------
>
>                 Key: SPARK-30741
>                 URL: https://issues.apache.org/jira/browse/SPARK-30741
>             Project: Spark
>          Issue Type: Bug
>          Components: Input/Output, PySpark
>    Affects Versions: 2.1.1
>            Reporter: Gary Liu
>            Priority: Major
>         Attachments: SparkBug.png
>
>
> When read SAS data using JDBC with SAS SHARE driver, the returned data 
> contains column labels, rather data. 
> According to testing result from SAS Support, the results are correct using 
> Java. So they believe it is due to spark reading. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-30741) The data returned from SAS using JDBC reader contains column label

Reply via email to