This is by hive's design. From the Hive documentation: The column change command will only modify Hive's metadata, and will not > modify data. Users should make sure the actual data layout of the > table/partition conforms with the metadata definition.
On Sat, Dec 6, 2014 at 8:28 PM, Jianshi Huang <jianshi.hu...@gmail.com> wrote: > Ok, found another possible bug in Hive. > > My current solution is to use ALTER TABLE CHANGE to rename the column > names. > > The problem is after renaming the column names, the value of the columns > became all NULL. > > Before renaming: > scala> sql("select `sorted::cre_ts` from pmt limit 1").collect > res12: Array[org.apache.spark.sql.Row] = Array([12/02/2014 07:38:54]) > > Execute renaming: > scala> sql("alter table pmt change `sorted::cre_ts` cre_ts string") > res13: org.apache.spark.sql.SchemaRDD = > SchemaRDD[972] at RDD at SchemaRDD.scala:108 > == Query Plan == > <Native command: executed by Hive> > > After renaming: > scala> sql("select cre_ts from pmt limit 1").collect > res16: Array[org.apache.spark.sql.Row] = Array([null]) > > I created a JIRA for it: > > https://issues.apache.org/jira/browse/SPARK-4781 > > > Jianshi > > On Sun, Dec 7, 2014 at 1:06 AM, Jianshi Huang <jianshi.hu...@gmail.com> > wrote: > >> Hmm... another issue I found doing this approach is that ANALYZE TABLE >> ... COMPUTE STATISTICS will fail to attach the metadata to the table, and >> later broadcast join and such will fail... >> >> Any idea how to fix this issue? >> >> Jianshi >> >> On Sat, Dec 6, 2014 at 9:10 PM, Jianshi Huang <jianshi.hu...@gmail.com> >> wrote: >> >>> Very interesting, the line doing drop table will throws an exception. >>> After removing it all works. >>> >>> Jianshi >>> >>> On Sat, Dec 6, 2014 at 9:11 AM, Jianshi Huang <jianshi.hu...@gmail.com> >>> wrote: >>> >>>> Here's the solution I got after talking with Liancheng: >>>> >>>> 1) using backquote `..` to wrap up all illegal characters >>>> >>>> val rdd = parquetFile(file) >>>> val schema = rdd.schema.fields.map(f => s"`${f.name}` >>>> ${HiveMetastoreTypes.toMetastoreType(f.dataType)}").mkString(",\n") >>>> >>>> val ddl_13 = s""" >>>> |CREATE EXTERNAL TABLE $name ( >>>> | $schema >>>> |) >>>> |STORED AS PARQUET >>>> |LOCATION '$file' >>>> """.stripMargin >>>> >>>> sql(ddl_13) >>>> >>>> 2) create a new Schema and do applySchema to generate a new SchemaRDD, >>>> had to drop and register table >>>> >>>> val t = table(name) >>>> val newSchema = StructType(t.schema.fields.map(s => s.copy(name = >>>> s.name.replaceAll(".*?::", "")))) >>>> sql(s"drop table $name") >>>> applySchema(t, newSchema).registerTempTable(name) >>>> >>>> I'm testing it for now. >>>> >>>> Thanks for the help! >>>> >>>> >>>> Jianshi >>>> >>>> On Sat, Dec 6, 2014 at 8:41 AM, Jianshi Huang <jianshi.hu...@gmail.com> >>>> wrote: >>>> >>>>> Hi, >>>>> >>>>> I had to use Pig for some preprocessing and to generate Parquet files >>>>> for Spark to consume. >>>>> >>>>> However, due to Pig's limitation, the generated schema contains Pig's >>>>> identifier >>>>> >>>>> e.g. >>>>> sorted::id, sorted::cre_ts, ... >>>>> >>>>> I tried to put the schema inside CREATE EXTERNAL TABLE, e.g. >>>>> >>>>> create external table pmt ( >>>>> sorted::id bigint >>>>> ) >>>>> stored as parquet >>>>> location '...' >>>>> >>>>> Obviously it didn't work, I also tried removing the identifier >>>>> sorted::, but the resulting rows contain only nulls. >>>>> >>>>> Any idea how to create a table in HiveContext from these Parquet files? >>>>> >>>>> Thanks, >>>>> Jianshi >>>>> -- >>>>> Jianshi Huang >>>>> >>>>> LinkedIn: jianshi >>>>> Twitter: @jshuang >>>>> Github & Blog: http://huangjs.github.com/ >>>>> >>>> >>>> >>>> >>>> -- >>>> Jianshi Huang >>>> >>>> LinkedIn: jianshi >>>> Twitter: @jshuang >>>> Github & Blog: http://huangjs.github.com/ >>>> >>> >>> >>> >>> -- >>> Jianshi Huang >>> >>> LinkedIn: jianshi >>> Twitter: @jshuang >>> Github & Blog: http://huangjs.github.com/ >>> >> >> >> >> -- >> Jianshi Huang >> >> LinkedIn: jianshi >> Twitter: @jshuang >> Github & Blog: http://huangjs.github.com/ >> > > > > -- > Jianshi Huang > > LinkedIn: jianshi > Twitter: @jshuang > Github & Blog: http://huangjs.github.com/ >