pin_zhang created SPARK-6923:
--------------------------------

             Summary: Get invalid hive table columns after save DataFrame to 
hive table
                 Key: SPARK-6923
                 URL: https://issues.apache.org/jira/browse/SPARK-6923
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 1.3.0
            Reporter: pin_zhang


HiveContext hctx = new HiveContext(sc);
List<String> sample = new ArrayList<String>();
sample.add( "{\"id\": \"id_1\", \"age\":1}" );
RDD<String> sampleRDD = new JavaSparkContext(sc).parallelize(sample).rdd();     
DataFrame df = hctx.jsonRDD(sampleRDD);
String table="test";
df.saveAsTable(table, "json",SaveMode.Overwrite);
Table t = hctx.catalog().client().getTable(table);
System.out.println( t.getCols());
--------------------------------------------------------------
With the code above to save DataFrame to hive table,
Get table cols returns one column named 'col'
[FieldSchema(name:col, type:array<string>, comment:from deserializer)]
Expected return fields schema id, age.

This results in the jdbc API cannot retrieves the table columns via ResultSet 
DatabaseMetaData.getColumns(String catalog, String schemaPattern,String 
tableNamePattern, String columnNamePattern)
But resultset metadata for query " select * from test "  contains fields id, 
age.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to