popart opened a new issue #1329: [SUPPORT] Presto cannot query non-partitioned table URL: https://github.com/apache/incubator-hudi/issues/1329 **Describe the problem you faced** I made a non-partitioned Hudi table using Spark. I was able to query it with Spark & Presto, but when I tried querying it with Presto, I received the error `Could not find partitionDepth in partition metafile`. **To Reproduce** Steps to reproduce the behavior: 1. Use an an emr-5.28.0 cluster 2. Run spark shell: ``` spark-shell --packages org.apache.hudi:hudi-spark-bundle_2.11:0.5.1-incubating,org.apache.spark:spark-avro_2.11:2.4.4 \ --conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer' --deploy-mode client ``` 3. Run spark code: ``` import scala.collection.JavaConversions._ import org.apache.spark.sql.SaveMode._ import org.apache.hudi.DataSourceReadOptions._ import org.apache.hudi.DataSourceWriteOptions._ import org.apache.hudi.config.HoodieWriteConfig._ import org.apache.hudi.hive._ import org.apache.hudi.keygen.NonpartitionedKeyGenerator val inputPath = "s3://path/to/a/parquet/file" val tableName = "my_test_table" val basePath = "s3://test-bucket/my_test_table" val inputDf = spark.read.parquet(inputPath) val hudiOptions = Map[String,String]( RECORDKEY_FIELD_OPT_KEY -> "dim_advertiser_id", PRECOMBINE_FIELD_OPT_KEY -> "update_time", TABLE_NAME -> tableName, KEYGENERATOR_CLASS_OPT_KEY -> classOf[NonpartitionedKeyGenerator].getCanonicalName, //needed for non partitioned table HIVE_PARTITION_EXTRACTOR_CLASS_OPT_KEY -> classOf[NonPartitionedExtractor].getCanonicalName, //needed for non partitioned table OPERATION_OPT_KEY -> BULK_INSERT_OPERATION_OPT_VAL, HIVE_SYNC_ENABLED_OPT_KEY -> "true", HIVE_TABLE_OPT_KEY -> tableName, TABLE_TYPE_OPT_KEY -> COW_TABLE_TYPE_OPT_VAL, "hoodie.bulkinsert.shuffle.parallelism" -> "10") inputDf.write.format("org.apache.hudi"). options(bulk_insert_hudiOptions). mode(Overwrite). save(basePath); ``` 4. Querying the table in Spark or Hive both work 5. Querying the table in Presto fails ``` [hadoop@ip-172-31-128-118 ~]$ presto-cli --catalog hive --schema default presto:default> select count(*) from my_test_table; Query 20200211_185123_00018_pruwt, FAILED, 1 node Splits: 17 total, 0 done (0.00%) 0:02 [0 rows, 0B] [0 rows/s, 0B/s] Query 20200211_185123_00018_pruwt failed: Could not find partitionDepth in partition metafile com.facebook.presto.spi.PrestoException: Could not find partitionDepth in partition metafile at com.facebook.presto.hive.BackgroundHiveSplitLoader$HiveSplitLoaderTask.process(BackgroundHiveSplitLoader.java:200) at com.facebook.presto.hive.util.ResumableTasks.safeProcessTask(ResumableTasks.java:47) at com.facebook.presto.hive.util.ResumableTasks.access$000(ResumableTasks.java:20) at com.facebook.presto.hive.util.ResumableTasks$1.run(ResumableTasks.java:35) at io.airlift.concurrent.BoundedExecutor.drainQueue(BoundedExecutor.java:78) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: org.apache.hudi.exception.HoodieException: Could not find partitionDepth in partition metafile at org.apache.hudi.common.model.HoodiePartitionMetadata.getPartitionDepth(HoodiePartitionMetadata.java:75) at org.apache.hudi.hadoop.HoodieParquetInputFormat.getTableMetaClient(HoodieParquetInputFormat.java:209) at org.apache.hudi.hadoop.HoodieParquetInputFormat.groupFileStatus(HoodieParquetInputFormat.java:158) at org.apache.hudi.hadoop.HoodieParquetInputFormat.listStatus(HoodieParquetInputFormat.java:69) at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:288) at com.facebook.presto.hive.BackgroundHiveSplitLoader.loadPartition(BackgroundHiveSplitLoader.java:371) at com.facebook.presto.hive.BackgroundHiveSplitLoader.loadSplits(BackgroundHiveSplitLoader.java:264) at com.facebook.presto.hive.BackgroundHiveSplitLoader.access$300(BackgroundHiveSplitLoader.java:96) at com.facebook.presto.hive.BackgroundHiveSplitLoader$HiveSplitLoaderTask.process(BackgroundHiveSplitLoader.java:193) ... 7 more ``` **Expected behavior** Presto should return a count of all the rows. Other Presto queries should succeed. **Environment Description** * EMR version: emr-5.28.0 * Hudi version : 0.5.1-incubating * Spark version : 2.4.4 * Hive version : 2.3.6 * Hadoop version : 2.8.5 * Presto version: 0.277 * Storage (HDFS/S3/GCS..) : S3 * Running on Docker? (yes/no) : no **Stacktrace** Included in "Steps to reproduce".
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
