You might also try out the recently added support for views. On Mon, Dec 8, 2014 at 9:31 PM, Jianshi Huang <jianshi.hu...@gmail.com> wrote:
> Ah... I see. Thanks for pointing it out. > > Then it means we cannot mount external table using customized column > names. hmm... > > Then the only option left is to use a subquery to add a bunch of column > alias. I'll try it later. > > Thanks, > Jianshi > > On Tue, Dec 9, 2014 at 3:34 AM, Michael Armbrust <mich...@databricks.com> > wrote: > >> This is by hive's design. From the Hive documentation: >> >> The column change command will only modify Hive's metadata, and will not >>> modify data. Users should make sure the actual data layout of the >>> table/partition conforms with the metadata definition. >> >> >> >> On Sat, Dec 6, 2014 at 8:28 PM, Jianshi Huang <jianshi.hu...@gmail.com> >> wrote: >> >>> Ok, found another possible bug in Hive. >>> >>> My current solution is to use ALTER TABLE CHANGE to rename the column >>> names. >>> >>> The problem is after renaming the column names, the value of the columns >>> became all NULL. >>> >>> Before renaming: >>> scala> sql("select `sorted::cre_ts` from pmt limit 1").collect >>> res12: Array[org.apache.spark.sql.Row] = Array([12/02/2014 07:38:54]) >>> >>> Execute renaming: >>> scala> sql("alter table pmt change `sorted::cre_ts` cre_ts string") >>> res13: org.apache.spark.sql.SchemaRDD = >>> SchemaRDD[972] at RDD at SchemaRDD.scala:108 >>> == Query Plan == >>> <Native command: executed by Hive> >>> >>> After renaming: >>> scala> sql("select cre_ts from pmt limit 1").collect >>> res16: Array[org.apache.spark.sql.Row] = Array([null]) >>> >>> I created a JIRA for it: >>> >>> https://issues.apache.org/jira/browse/SPARK-4781 >>> >>> >>> Jianshi >>> >>> On Sun, Dec 7, 2014 at 1:06 AM, Jianshi Huang <jianshi.hu...@gmail.com> >>> wrote: >>> >>>> Hmm... another issue I found doing this approach is that ANALYZE TABLE >>>> ... COMPUTE STATISTICS will fail to attach the metadata to the table, and >>>> later broadcast join and such will fail... >>>> >>>> Any idea how to fix this issue? >>>> >>>> Jianshi >>>> >>>> On Sat, Dec 6, 2014 at 9:10 PM, Jianshi Huang <jianshi.hu...@gmail.com> >>>> wrote: >>>> >>>>> Very interesting, the line doing drop table will throws an exception. >>>>> After removing it all works. >>>>> >>>>> Jianshi >>>>> >>>>> On Sat, Dec 6, 2014 at 9:11 AM, Jianshi Huang <jianshi.hu...@gmail.com >>>>> > wrote: >>>>> >>>>>> Here's the solution I got after talking with Liancheng: >>>>>> >>>>>> 1) using backquote `..` to wrap up all illegal characters >>>>>> >>>>>> val rdd = parquetFile(file) >>>>>> val schema = rdd.schema.fields.map(f => s"`${f.name}` >>>>>> ${HiveMetastoreTypes.toMetastoreType(f.dataType)}").mkString(",\n") >>>>>> >>>>>> val ddl_13 = s""" >>>>>> |CREATE EXTERNAL TABLE $name ( >>>>>> | $schema >>>>>> |) >>>>>> |STORED AS PARQUET >>>>>> |LOCATION '$file' >>>>>> """.stripMargin >>>>>> >>>>>> sql(ddl_13) >>>>>> >>>>>> 2) create a new Schema and do applySchema to generate a new >>>>>> SchemaRDD, had to drop and register table >>>>>> >>>>>> val t = table(name) >>>>>> val newSchema = StructType(t.schema.fields.map(s => s.copy(name = >>>>>> s.name.replaceAll(".*?::", "")))) >>>>>> sql(s"drop table $name") >>>>>> applySchema(t, newSchema).registerTempTable(name) >>>>>> >>>>>> I'm testing it for now. >>>>>> >>>>>> Thanks for the help! >>>>>> >>>>>> >>>>>> Jianshi >>>>>> >>>>>> On Sat, Dec 6, 2014 at 8:41 AM, Jianshi Huang < >>>>>> jianshi.hu...@gmail.com> wrote: >>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> I had to use Pig for some preprocessing and to generate Parquet >>>>>>> files for Spark to consume. >>>>>>> >>>>>>> However, due to Pig's limitation, the generated schema contains >>>>>>> Pig's identifier >>>>>>> >>>>>>> e.g. >>>>>>> sorted::id, sorted::cre_ts, ... >>>>>>> >>>>>>> I tried to put the schema inside CREATE EXTERNAL TABLE, e.g. >>>>>>> >>>>>>> create external table pmt ( >>>>>>> sorted::id bigint >>>>>>> ) >>>>>>> stored as parquet >>>>>>> location '...' >>>>>>> >>>>>>> Obviously it didn't work, I also tried removing the identifier >>>>>>> sorted::, but the resulting rows contain only nulls. >>>>>>> >>>>>>> Any idea how to create a table in HiveContext from these Parquet >>>>>>> files? >>>>>>> >>>>>>> Thanks, >>>>>>> Jianshi >>>>>>> -- >>>>>>> Jianshi Huang >>>>>>> >>>>>>> LinkedIn: jianshi >>>>>>> Twitter: @jshuang >>>>>>> Github & Blog: http://huangjs.github.com/ >>>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Jianshi Huang >>>>>> >>>>>> LinkedIn: jianshi >>>>>> Twitter: @jshuang >>>>>> Github & Blog: http://huangjs.github.com/ >>>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> Jianshi Huang >>>>> >>>>> LinkedIn: jianshi >>>>> Twitter: @jshuang >>>>> Github & Blog: http://huangjs.github.com/ >>>>> >>>> >>>> >>>> >>>> -- >>>> Jianshi Huang >>>> >>>> LinkedIn: jianshi >>>> Twitter: @jshuang >>>> Github & Blog: http://huangjs.github.com/ >>>> >>> >>> >>> >>> -- >>> Jianshi Huang >>> >>> LinkedIn: jianshi >>> Twitter: @jshuang >>> Github & Blog: http://huangjs.github.com/ >>> >> >> > > > -- > Jianshi Huang > > LinkedIn: jianshi > Twitter: @jshuang > Github & Blog: http://huangjs.github.com/ >