Ah... I see. Thanks for pointing it out. Then it means we cannot mount external table using customized column names. hmm...
Then the only option left is to use a subquery to add a bunch of column alias. I'll try it later. Thanks, Jianshi On Tue, Dec 9, 2014 at 3:34 AM, Michael Armbrust <mich...@databricks.com> wrote: > This is by hive's design. From the Hive documentation: > > The column change command will only modify Hive's metadata, and will not >> modify data. Users should make sure the actual data layout of the >> table/partition conforms with the metadata definition. > > > > On Sat, Dec 6, 2014 at 8:28 PM, Jianshi Huang <jianshi.hu...@gmail.com> > wrote: > >> Ok, found another possible bug in Hive. >> >> My current solution is to use ALTER TABLE CHANGE to rename the column >> names. >> >> The problem is after renaming the column names, the value of the columns >> became all NULL. >> >> Before renaming: >> scala> sql("select `sorted::cre_ts` from pmt limit 1").collect >> res12: Array[org.apache.spark.sql.Row] = Array([12/02/2014 07:38:54]) >> >> Execute renaming: >> scala> sql("alter table pmt change `sorted::cre_ts` cre_ts string") >> res13: org.apache.spark.sql.SchemaRDD = >> SchemaRDD[972] at RDD at SchemaRDD.scala:108 >> == Query Plan == >> <Native command: executed by Hive> >> >> After renaming: >> scala> sql("select cre_ts from pmt limit 1").collect >> res16: Array[org.apache.spark.sql.Row] = Array([null]) >> >> I created a JIRA for it: >> >> https://issues.apache.org/jira/browse/SPARK-4781 >> >> >> Jianshi >> >> On Sun, Dec 7, 2014 at 1:06 AM, Jianshi Huang <jianshi.hu...@gmail.com> >> wrote: >> >>> Hmm... another issue I found doing this approach is that ANALYZE TABLE >>> ... COMPUTE STATISTICS will fail to attach the metadata to the table, and >>> later broadcast join and such will fail... >>> >>> Any idea how to fix this issue? >>> >>> Jianshi >>> >>> On Sat, Dec 6, 2014 at 9:10 PM, Jianshi Huang <jianshi.hu...@gmail.com> >>> wrote: >>> >>>> Very interesting, the line doing drop table will throws an exception. >>>> After removing it all works. >>>> >>>> Jianshi >>>> >>>> On Sat, Dec 6, 2014 at 9:11 AM, Jianshi Huang <jianshi.hu...@gmail.com> >>>> wrote: >>>> >>>>> Here's the solution I got after talking with Liancheng: >>>>> >>>>> 1) using backquote `..` to wrap up all illegal characters >>>>> >>>>> val rdd = parquetFile(file) >>>>> val schema = rdd.schema.fields.map(f => s"`${f.name}` >>>>> ${HiveMetastoreTypes.toMetastoreType(f.dataType)}").mkString(",\n") >>>>> >>>>> val ddl_13 = s""" >>>>> |CREATE EXTERNAL TABLE $name ( >>>>> | $schema >>>>> |) >>>>> |STORED AS PARQUET >>>>> |LOCATION '$file' >>>>> """.stripMargin >>>>> >>>>> sql(ddl_13) >>>>> >>>>> 2) create a new Schema and do applySchema to generate a new SchemaRDD, >>>>> had to drop and register table >>>>> >>>>> val t = table(name) >>>>> val newSchema = StructType(t.schema.fields.map(s => s.copy(name = >>>>> s.name.replaceAll(".*?::", "")))) >>>>> sql(s"drop table $name") >>>>> applySchema(t, newSchema).registerTempTable(name) >>>>> >>>>> I'm testing it for now. >>>>> >>>>> Thanks for the help! >>>>> >>>>> >>>>> Jianshi >>>>> >>>>> On Sat, Dec 6, 2014 at 8:41 AM, Jianshi Huang <jianshi.hu...@gmail.com >>>>> > wrote: >>>>> >>>>>> Hi, >>>>>> >>>>>> I had to use Pig for some preprocessing and to generate Parquet files >>>>>> for Spark to consume. >>>>>> >>>>>> However, due to Pig's limitation, the generated schema contains Pig's >>>>>> identifier >>>>>> >>>>>> e.g. >>>>>> sorted::id, sorted::cre_ts, ... >>>>>> >>>>>> I tried to put the schema inside CREATE EXTERNAL TABLE, e.g. >>>>>> >>>>>> create external table pmt ( >>>>>> sorted::id bigint >>>>>> ) >>>>>> stored as parquet >>>>>> location '...' >>>>>> >>>>>> Obviously it didn't work, I also tried removing the identifier >>>>>> sorted::, but the resulting rows contain only nulls. >>>>>> >>>>>> Any idea how to create a table in HiveContext from these Parquet >>>>>> files? >>>>>> >>>>>> Thanks, >>>>>> Jianshi >>>>>> -- >>>>>> Jianshi Huang >>>>>> >>>>>> LinkedIn: jianshi >>>>>> Twitter: @jshuang >>>>>> Github & Blog: http://huangjs.github.com/ >>>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> Jianshi Huang >>>>> >>>>> LinkedIn: jianshi >>>>> Twitter: @jshuang >>>>> Github & Blog: http://huangjs.github.com/ >>>>> >>>> >>>> >>>> >>>> -- >>>> Jianshi Huang >>>> >>>> LinkedIn: jianshi >>>> Twitter: @jshuang >>>> Github & Blog: http://huangjs.github.com/ >>>> >>> >>> >>> >>> -- >>> Jianshi Huang >>> >>> LinkedIn: jianshi >>> Twitter: @jshuang >>> Github & Blog: http://huangjs.github.com/ >>> >> >> >> >> -- >> Jianshi Huang >> >> LinkedIn: jianshi >> Twitter: @jshuang >> Github & Blog: http://huangjs.github.com/ >> > > -- Jianshi Huang LinkedIn: jianshi Twitter: @jshuang Github & Blog: http://huangjs.github.com/