Thanks Brock. I uploaded it as an attachment to: https://issues.apache.org/jira/browse/HIVE-7554
-Raymond On Wed, Jul 30, 2014 at 8:42 AM, Brock Noland <[email protected]> wrote: > Hi Raymond, > > Would you be able to upload a small (1-10 rows) parquet file stored which > would demonstrate the bug without the fix? > > https://issues.apache.org/jira/browse/HIVE-7554 > https://issues.apache.org/jira/browse/PARQUET-54 > > Cheers, > Brock > > > On Tue, Jul 29, 2014 at 6:47 PM, Raymond Lau <[email protected]> wrote: > > > I'm pretty sure my distribution is Cloudera, all my Hadoop/Hive folders > > have CDH in them. Hive Version: 0.12 > > > > Raymond > > > > > > On Tue, Jul 29, 2014 at 6:03 PM, Brock Noland <[email protected]> > wrote: > > > > > Hi, > > > > > > Thanks for the message. I am looking at this issue myself. > > > > > > Which version if Hive are you using from which distribution? > > > > > > Brock > > > On Jul 29, 2014 1:09 PM, "Raymond Lau" <[email protected]> wrote: > > > > > > > So I'm having the same case sensitivity issue mentioned in a previous > > > > thread: > > https://groups.google.com/forum/#!topic/parquet-dev/ko-TM2lLpxE > > > > > > > > The solution that Christos posted works great, but it didn't work for > > me > > > > when it comes to *partitioned* external tables, either I couldn't > read > > > or I > > > > couldn't write. All of the data I'm working with is already > > partitioned > > > in > > > > HDFS so all I need to do is run an 'ALTER TABLE table ADD PARTITION > > > > (partitionkey = blah) LOCATION '/path/'. > > > > > > > > The workaround I made for this was by editing the init function in > the > > > > DataWritableReadSupport class (Original - > > > > > > > > > > > > > > https://github.com/Parquet/parquet-mr/blob/7b0778c490e6782a83663bd5b1ec9d8a7dd7c2ae/parquet-hive/parquet-hive-storage-handler/src/main/java/org/apache/hadoop/hive/ql/io/parquet/read/DataWritableReadSupport.java > > > > ), > > > > so that lower-cased field names would be used for the Hive table and > > when > > > > the Parquet files are being read, the typeListWanted is edited so > that > > it > > > > properly reads the data that I need. I'm able to insert all of my > data > > > and > > > > run queries on it in Hive. > > > > > > > > if (columns != null) { > > > > final List<String> listColumns = getColumns(columns); > > > > > > > > /* EDIT - create a map that maps lowercase field name -> > > > normal > > > > field name from the parquet files */ > > > > final Map<String, String> lowerCaseFileSchemaColumns = > new > > > > HashMap<String,String>(); > > > > for(ColumnDescriptor c : fileSchema.getColumns()) { > > > > > > > > lowerCaseFileSchemaColumns.put(c.getPath()[0].toLowerCase(), > > > > c.getPath()[0]); > > > > } > > > > > > > > final List<Type> typeListTable = new ArrayList<Type>(); > > > > for (final String col : listColumns) { > > > > /* EDIT - check if a Hive column field exists in the > > map, > > > > instead of whether it exists in the parquet file schema. this is > where > > > the > > > > case sensitivity would normally cause a problem. if it exists, get > the > > > > type information from the parquet file schema (we need the case > > sensitive > > > > field name to get it) */ > > > > if (lowerCaseFileSchemaColumns.containsKey(col)) { > > > > > > > > > > > > > > typeListTable.add(fileSchema.getType(lowerCaseFileSchemaColumns.get(col))); > > > > } else { > > > > typeListTable.add(new > > > > PrimitiveType(Repetition.OPTIONAL, PrimitiveTypeName.BINARY, col)); > > > > } > > > > } > > > > > > > > MessageType tableSchema = new MessageType(TABLE_SCHEMA, > > > > typeListTable); > > > > contextMetadata.put(HIVE_SCHEMA_KEY, > > tableSchema.toString()); > > > > > > > > MessageType requestedSchemaByUser = tableSchema; > > > > final List<Integer> indexColumnsWanted = > > > > getReadColumnIDs(configuration); > > > > > > > > final List<Type> typeListWanted = new ArrayList<Type>(); > > > > > > > > /* EDIT - again we need the case sensitive field name for > > > > getType */ > > > > for (final Integer idx : indexColumnsWanted) { > > > > > > > > > > > > > > > > > > typeListWanted.add(tableSchema.getType(lowerCaseFileSchemaColumns.get(listColumns.get(idx)))); > > > > } > > > > > > > > .... > > > > > > > > I was wondering if there were any consequences of doing it this way > > that > > > I > > > > missed and whether this fix or something similar could someday > become a > > > > patch. > > > > > > > > -- > > > > *Raymond Lau* > > > > Software Engineer - Intern | > > > > [email protected] | (925) 395-3806 > > > > > > > > > > > > > > > -- > > *Raymond Lau* > > Software Engineer - Intern | > > [email protected] | (925) 395-3806 > > > -- *Raymond Lau* Software Engineer - Intern | [email protected] | (925) 395-3806
