[jira] [Commented] (SPARK-10754) table and column name are case sensitive when json Dataframe was registered as tempTable using JavaSparkContext.

Antonio Piccolboni (JIRA) Wed, 18 Nov 2015 15:13:42 -0800

    [ 
https://issues.apache.org/jira/browse/SPARK-10754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15012320#comment-15012320
 ]


Antonio Piccolboni commented on SPARK-10754:
--------------------------------------------

Maybe related. Created table from csv using spark-csv. Colnames contain a mix 
of upper and lower in file and continue to do so in table, as shown by 
describe. Then I create a table with  CREATE TABLE AS SELECT. New table has 
lowercase col names. This seems case sensitive sometimes, and case insensitive 
some other times. Please let me know if I need to open a separate report. Test 
case follows


Sample data

"playerID","yearID","stint","teamID","lgID","G","G_batting","AB","R","H","X2B","X3B","HR","RBI","SB","CS","BB","SO","IBB","HBP","SH","SF","GIDP","G_old"
"aardsda01",2004,1,"SFN","NL",11,11,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,11
"aardsda01",2006,1,"CHN","NL",45,43,2,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,45
"aardsda01",2007,1,"CHA","AL",25,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2
"aardsda01",2008,1,"BOS","AL",47,5,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,5
"aardsda01",2009,1,"SEA","AL",73,3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,NA
"aardsda01",2010,1,"SEA","AL",53,4,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,NA
"aardsda01",2012,1,"NYA","AL",1,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA
"aaronha01",1954,1,"ML1","NL",122,122,468,58,131,27,6,13,69,2,2,28,39,NA,3,6,4,13,122
"aaronha01",1955,1,"ML1","NL",153,153,602,105,189,37,9,27,106,3,1,49,61,5,3,7,4,20,153


Create table with 

 CREATE TABLE `batting` USING com.databricks.spark.csv OPTIONS (path 
'/var/folders/_p/1gx4vy311_x4syn2xq6f2xtc0000gr/T//Rtmp0E8pqi/file11a8546f94ed6',
 header 'TRUE', delimiter ',', quote '"', parserLib 'commons', mode 
'PERMISSIVE', charset 'UTF-8', inferSchema 'TRUE', comment '#')

Upper and lower cases preserved:

Browse[6]> qy("describe batting", my_db)
    col_name data_type comment
1   playerID    string        
2     yearID       int        
3      stint       int        
4     teamID    string        
5       lgID    string        
6          G       int        
7  G_batting    string        
8         AB    string        
9          R    string        
10         H    string        
11       X2B    string        
12       X3B    string        
13        HR    string        
14       RBI    string        
15        SB    string        
16        CS    string        
17        BB    string        
18        SO    string        
19       IBB    string        
20       HBP    string        
21        SH    string        
22        SF    string        
23      GIDP    string        
24     G_old    string        


Create other table with

 CREATE TABLE `xxhcteugas` AS SELECT `playerID` AS `playerID`, `yearID` AS 
`yearID`, `teamID` AS `teamID`, `G` AS `G`, `AB` AS `AB`, `R` AS `R`, `H` AS `H`
FROM `batting`
ORDER BY `playerID`, `yearID`, `teamID`
Browse[6]> 

Upper case gone in colnames

Browse[6]> qy("describe xxhcteugas", my_db)
  col_name data_type comment
1 playerid    string    <NA>
2   yearid       int    <NA>
3   teamid    string    <NA>
4        g       int    <NA>
5       ab    string    <NA>
6        r    string    <NA>
7        h    string    <NA>

> table and column name are case sensitive when json Dataframe was registered 
> as tempTable using JavaSparkContext. 
> -----------------------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-10754
>                 URL: https://issues.apache.org/jira/browse/SPARK-10754
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 1.3.0, 1.3.1, 1.4.1
>         Environment: Linux ,Hadoop Version 1.3
>            Reporter: Babulal
>
> Create a dataframe using json data source 
>       SparkConf conf=new 
> SparkConf().setMaster("spark://xyz:7077")).setAppName("Spark Tabble");
>       JavaSparkContext javacontext=new JavaSparkContext(conf);
>       SQLContext sqlContext=new SQLContext(javacontext);
>       
>       DataFrame df = 
> sqlContext.jsonFile("/user/root/examples/src/main/resources/people.json");
>               
>       df.registerTempTable("sparktable");
>       
>       Run the Query
>       
>       sqlContext.sql("select * from sparktable").show()    // this will PASs
>       
>       
>       sqlContext.sql("select * from sparkTable").show()    /// This will FAIL 
>       
>       java.lang.RuntimeException: Table Not Found: sparkTable
>         at scala.sys.package$.error(package.scala:27)
>         at 
> org.apache.spark.sql.catalyst.analysis.SimpleCatalog$$anonfun$1.apply(Catalog.scala:115)
>         at 
> org.apache.spark.sql.catalyst.analysis.SimpleCatalog$$anonfun$1.apply(Catalog.scala:115)
>         at scala.collection.MapLike$class.getOrElse(MapLike.scala:128)
>         at scala.collection.AbstractMap.getOrElse(Map.scala:58)
>         at 
> org.apache.spark.sql.catalyst.analysis.SimpleCatalog.lookupRelation(Catalog.scala:115)
>         at 
> org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.getTable(Analyzer.scala:233)
>               
>               
>       



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SPARK-10754) table and column name are case sensitive when json Dataframe was registered as tempTable using JavaSparkContext.

Reply via email to