pin_zhang created SPARK-6923:
--------------------------------
Summary: Get invalid hive table columns after save DataFrame to
hive table
Key: SPARK-6923
URL: https://issues.apache.org/jira/browse/SPARK-6923
Project: Spark
Issue Type: Bug
Components: SQL
Affects Versions: 1.3.0
Reporter: pin_zhang
HiveContext hctx = new HiveContext(sc);
List<String> sample = new ArrayList<String>();
sample.add( "{\"id\": \"id_1\", \"age\":1}" );
RDD<String> sampleRDD = new JavaSparkContext(sc).parallelize(sample).rdd();
DataFrame df = hctx.jsonRDD(sampleRDD);
String table="test";
df.saveAsTable(table, "json",SaveMode.Overwrite);
Table t = hctx.catalog().client().getTable(table);
System.out.println( t.getCols());
--------------------------------------------------------------
With the code above to save DataFrame to hive table,
Get table cols returns one column named 'col'
[FieldSchema(name:col, type:array<string>, comment:from deserializer)]
Expected return fields schema id, age.
This results in the jdbc API cannot retrieves the table columns via ResultSet
DatabaseMetaData.getColumns(String catalog, String schemaPattern,String
tableNamePattern, String columnNamePattern)
But resultset metadata for query " select * from test " contains fields id,
age.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]