> On Dec. 1, 2014, 8:13 p.m., Ryan Blue wrote: > > common/src/java/org/apache/hadoop/hive/conf/HiveConf.java, line 762 > > <https://reviews.apache.org/r/28372/diff/2/?file=777120#file777120line762> > > > > What does this change do?
Serdes in this list must have one or more columns in the create table statement. This change removes this check for ParquetHiveSerDe. > On Dec. 1, 2014, 8:13 p.m., Ryan Blue wrote: > > ql/src/java/org/apache/hadoop/hive/ql/io/parquet/convert/ColInfoFromParquetFile.java, > > line 36 > > <https://reviews.apache.org/r/28372/diff/2/?file=777121#file777121line36> > > > > I don't think ColInfoFromParquetFile is a great name for this class. > > What you've implemented here is a schema converter, like > > HiveSchemaConverter but from MessageType to TypeInfo rather than the other > > way around. I think this should either go in HiveSchemaConverter or a new > > class, like ParquetToHiveSchemaConverter. > > > > This update would also help clarify the methods exposed. I think that > > the primary method this should expose is: > > StructType convert(GroupType parquetSchema) > > > > File reading should be done elsewhere if it is necessary. > > > > Also, I think this should return a StructType instead of a Pair with > > Strings. That would be a lot cleaner and would avoid the custom type string > > building immediately followed by parsing those type strings. Made appropriate changes. > On Dec. 1, 2014, 8:13 p.m., Ryan Blue wrote: > > ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ParquetHiveSerDe.java, > > line 107 > > <https://reviews.apache.org/r/28372/diff/2/?file=777122#file777122line107> > > > > Is it possible to avoid setting a file property? There are lots of > > cases where a file in the dataset would be removed so this is a brittle > > method of configuring the table. Ideally, we would check for the > > LIST_COLUMN_TYPES property and if we don't find it, conver the schema of > > the first file that we find. Below are the different scenarios possible and how they will be handled. 1. Managed table with parquet.file specified: 1.a. While creating the table ParquetHiveSerDe will use that file to create hive schema. 1.b. If at a later point of time, say while select data from table, file specified by parquet.file is deleted, then ParquetHiveSerDe will look for a file in table dir and will try to create hive schema from that file's parquet metadata. 1.c. If parquet.file is deleted and table dir is empty, there is no way hive schema can be created. So, an informative exception will be thrown, "Either provide schema for table or point to parquet file using parquet.file in tblproperties or make sure that table has atleast one parquet file with required metadata". Note that if a table reaches this state, it can be fixed either by altering table and pointing parquet.file to a valid file or by copying a parquet file to table dir. This is valid for all cases. 2. Managed table without parquet.file specified: 2.a. While creating the table ParquetHiveSerDe will use any file in table dir to create hive schema. If table dir is empty, it will throw an informative exception. 3. External table with parquet.file specified: Same as (1). 4. External table without parquet.file specified and empty table dir: Same as (1.c) 5. External table without parquet.file specified and a parquet file present in table dir: Same as (2) - Ashish ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/28372/#review63424 ----------------------------------------------------------- On Nov. 27, 2014, 1:08 a.m., Ashish Singh wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/28372/ > ----------------------------------------------------------- > > (Updated Nov. 27, 2014, 1:08 a.m.) > > > Review request for hive. > > > Bugs: HIVE-8950 > https://issues.apache.org/jira/browse/HIVE-8950 > > > Repository: hive-git > > > Description > ------- > > HIVE-8950: Add support in ParquetHiveSerde to create table schema from a > parquet file > > > Diffs > ----- > > common/src/java/org/apache/hadoop/hive/conf/HiveConf.java > fafd78e63e9b41c9fdb0e017b567dc719d151784 > metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreUtils.java > 2db2658fbc57fba01c892c9213baef6c498e659b > > ql/src/java/org/apache/hadoop/hive/ql/io/parquet/convert/ParquetSchemaReader.java > PRE-CREATION > > ql/src/java/org/apache/hadoop/hive/ql/io/parquet/convert/ParquetToHiveSchemaConverter.java > PRE-CREATION > > ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ParquetHiveSerDe.java > 4effe736fcf9d3715f03eed9885c299a7aa040dd > ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java > cd3d349e8bd8785d6cadaf9ed8fa7598f223774a > > ql/src/test/queries/clientpositive/parquet_array_of_multi_field_struct_gen_schema.q > PRE-CREATION > > ql/src/test/queries/clientpositive/parquet_array_of_optional_elements_gen_schema.q > PRE-CREATION > > ql/src/test/queries/clientpositive/parquet_array_of_required_elements_gen_schema.q > PRE-CREATION > > ql/src/test/queries/clientpositive/parquet_array_of_single_field_struct_gen_schema.q > PRE-CREATION > ql/src/test/queries/clientpositive/parquet_array_of_structs_gen_schema.q > PRE-CREATION > > ql/src/test/queries/clientpositive/parquet_array_of_structs_gen_schema_ext.q > PRE-CREATION > > ql/src/test/queries/clientpositive/parquet_array_of_unannotated_groups_gen_schema.q > PRE-CREATION > > ql/src/test/queries/clientpositive/parquet_array_of_unannotated_primitives_gen_schema.q > PRE-CREATION > > ql/src/test/queries/clientpositive/parquet_avro_array_of_primitives_gen_schema.q > PRE-CREATION > > ql/src/test/queries/clientpositive/parquet_avro_array_of_single_field_struct_gen_schema.q > PRE-CREATION > ql/src/test/queries/clientpositive/parquet_decimal_gen_schema.q > PRE-CREATION > > ql/src/test/queries/clientpositive/parquet_thrift_array_of_primitives_gen_schema.q > PRE-CREATION > > ql/src/test/queries/clientpositive/parquet_thrift_array_of_single_field_struct_gen_schema.q > PRE-CREATION > > ql/src/test/results/clientpositive/parquet_array_of_multi_field_struct_gen_schema.q.out > PRE-CREATION > > ql/src/test/results/clientpositive/parquet_array_of_optional_elements_gen_schema.q.out > PRE-CREATION > > ql/src/test/results/clientpositive/parquet_array_of_required_elements_gen_schema.q.out > PRE-CREATION > > ql/src/test/results/clientpositive/parquet_array_of_single_field_struct_gen_schema.q.out > PRE-CREATION > > ql/src/test/results/clientpositive/parquet_array_of_structs_gen_schema.q.out > PRE-CREATION > > ql/src/test/results/clientpositive/parquet_array_of_structs_gen_schema_ext.q.out > PRE-CREATION > > ql/src/test/results/clientpositive/parquet_array_of_unannotated_groups_gen_schema.q.out > PRE-CREATION > > ql/src/test/results/clientpositive/parquet_array_of_unannotated_primitives_gen_schema.q.out > PRE-CREATION > > ql/src/test/results/clientpositive/parquet_avro_array_of_primitives_gen_schema.q.out > PRE-CREATION > > ql/src/test/results/clientpositive/parquet_avro_array_of_single_field_struct_gen_schema.q.out > PRE-CREATION > ql/src/test/results/clientpositive/parquet_decimal_gen_schema.q.out > PRE-CREATION > > ql/src/test/results/clientpositive/parquet_thrift_array_of_primitives_gen_schema.q.out > PRE-CREATION > > ql/src/test/results/clientpositive/parquet_thrift_array_of_single_field_struct_gen_schema.q.out > PRE-CREATION > > Diff: https://reviews.apache.org/r/28372/diff/ > > > Testing > ------- > > Tested by adding appropriate qTests. > > > Thanks, > > Ashish Singh > >