rajgowtham24 opened a new issue #4411:
URL: https://github.com/apache/hudi/issues/4411
Hi team,
We are using Hudi 0.5.0, Presto 0.230 in emr 6.0.0 version and we have used
the below sample code for data ingestion. During the upgrade, we are facing
error with Presto.
Below code executed in 0.5.0. Post the data ingestion we have manually
updated the table location to **"s3://bucket_name/test/table_name/default"**
and we are able to query the data in Presto 0.230 as well.
We are in progress of upgrading the emr version to 6.3, during the POC we
have observed that the table ingested through emr 6.0 (Hudi 0.5.0, Presto
0.230) is not working in Presto query engine emr 6.3 (Presto 0.245.1 & Hudi
0.7.0). provided the error below. Could you please take a look at the below and
let me know
Code Executed in 0.5.0
inputDF = spark.createDataFrame(
[
("100","Apache","Spark", "2015/01/01",
"2015-01-01T13:51:39.340396Z","2015-01-02"),
("101","Apache","Beam","2015/01/01",
"2015-01-01T12:14:58.597216Z","2015-01-02"),
("102","Apache","Hudi","2015/01/01",
"2015-01-01T13:51:40.417052Z","2015-01-02"),
("103","GCP","DataProc","2015/01/01",
"2015-01-01T13:51:40.519832Z","2015-01-02"),
("104","AWS","Glue","2015/01/02",
"2015-01-01T12:15:00.512679Z","2015-01-03"),
("105","AWS","EMR","2015/01/02",
"2015-01-01T13:51:42.248818Z","2015-01-03")
],
["id","provider","technology", "created_date",
"last_update_time","load_date"]
)
hive_jdbcurl = "jdbc:hive2://localhost:10000"
hive_username = "hive"
hive_password = "xxx"
hudi_table_name="table_name"
landing_bucket = "s3://bucket_name/test"
hive_schema = "test"
inputDF.write.format("org.apache.hudi").option("hoodie.datasource.write.recordkey.field","id,provider").option("hoodie.datasource.write.precombine.field","last_update_time").option("hoodie.table.name",hudi_table_name).option("hoodie.datasource.write.storage.type","MERGE_ON_READ").option("hoodie.datasource.hive_sync.enable","true").option("hoodie.datasource.hive_sync.database",hive_schema).option("hoodie.datasource.hive_sync.table",hudi_table_name).option("hoodie.datasource.hive_sync.jdbcurl",hive_jdbcurl).option("hoodie.datasource.hive_sync.username",hive_username).option("hoodie.datasource.hive_sync.password",hive_password).option("hoodie.datasource.hive_sync.assume_date_partitioning","false").option("hoodie.datasource.hive_sync.partition_extractor_class","org.apache.hudi.hive.NonPartitionedExtractor").option("hoodie.datasource.write.keygenerator.class","org.apache.hudi.ComplexKeyGenerator").option("hoodie.compact.inline","true").option("hoodie.bulkinsert.shuffle.parallelism","10
0").option("hoodie.insert.shuffle.parallelism",
"2").option("hoodie.upsert.shuffle.parallelism",
"2").option("hoodie.datasource.write.operation","bulk_insert").mode("Overwrite").save(landing_bucket+hudi_table_name)
**Presto Error**
Hoodie table not found in path
s3://bucket_name/test/table_name/default/.hoodie at
com.facebook.presto.jdbc.PrestoResultSet.resultsException(PrestoResultSet.java:1841)
at
com.facebook.presto.jdbc.PrestoResultSet$ResultsPageIterator.computeNext(PrestoResultSet.java:1821)
at
com.facebook.presto.jdbc.PrestoResultSet$ResultsPageIterator.computeNext(PrestoResultSet.java:1760)
at
com.facebook.presto.jdbc.internal.guava.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:141)
at
com.facebook.presto.jdbc.internal.guava.collect.AbstractIterator.hasNext(AbstractIterator.java:136)
at
com.facebook.presto.jdbc.internal.guava.collect.TransformedIterator.hasNext(TransformedIterator.java:42)
at
com.facebook.presto.jdbc.internal.guava.collect.Iterators$ConcatenatedIterator.getTopMetaIterator(Iterators.java:1311)
at
com.facebook.presto.jdbc.internal.guava.collect.Iterators$ConcatenatedIterator.hasNext(Iterators.java:1327)
at com.facebook.presto.jdbc.PrestoResultSet.next(PrestoResultSet.
java:146) at sun.reflect.GeneratedMethodAccessor1.invoke(Unknown Source) at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498) at
py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231) at
py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:381) at
py4j.Gateway.invoke(Gateway.java:259) at
py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133) at
py4j.commands.CallCommand.execute(CallCommand.java:79) at
py4j.GatewayConnection.run(GatewayConnection.java:209) at
java.lang.Thread.run(Thread.java:748) Caused by:
org.apache.hudi.exception.TableNotFoundException: Hoodie table not found in
path
s3://az-eu-azodh-opsit-s3-format-dev/f_poc/63_hudi_write_complex/default/.hoodie
at
org.apache.hudi.exception.TableNotFoundException.checkTableValidity(TableNotFoundException.java:53)
at
org.apache.hudi.common.table.HoodieTableMetaClient.<init>(HoodieTableMetaClient.java:128)
at org.apa
che.hudi.common.table.HoodieTableMetaClient.<init>(HoodieTableMetaClient.java:114)
at
org.apache.hudi.common.table.HoodieTableMetaClient.<init>(HoodieTableMetaClient.java:100)
at
com.facebook.presto.hive.HudiDirectoryLister.<init>(HudiDirectoryLister.java:58)
at
com.facebook.presto.hive.StoragePartitionLoader.<init>(StoragePartitionLoader.java:138)
at
com.facebook.presto.hive.DelegatingPartitionLoader.<init>(DelegatingPartitionLoader.java:56)
at
com.facebook.presto.hive.BackgroundHiveSplitLoader.<init>(BackgroundHiveSplitLoader.java:112)
at
com.facebook.presto.hive.HiveSplitManager.getSplits(HiveSplitManager.java:298)
at
com.facebook.presto.spi.connector.classloader.ClassLoaderSafeConnectorSplitManager.getSplits(ClassLoaderSafeConnectorSplitManager.java:62)
at com.facebook.presto.split.SplitManager.getSplits(SplitManager.java:90) at
com.facebook.presto.split.CloseableSplitSourceProvider.getSplits(CloseableSplitSourceProvider.java:53)
at com.facebook.presto.sql.planner.SplitSourceFac
tory$Visitor.lambda$visitScanAndFilter$1(SplitSourceFactory.java:192) at
com.facebook.presto.sql.planner.LazySplitSource.getDelegate(LazySplitSource.java:96)
at
com.facebook.presto.sql.planner.LazySplitSource.getConnectorId(LazySplitSource.java:48)
at
com.facebook.presto.execution.scheduler.SectionExecutionFactory.createStageScheduler(SectionExecutionFactory.java:281)
at
com.facebook.presto.execution.scheduler.SectionExecutionFactory.createStreamingLinkedStageExecutions(SectionExecutionFactory.java:243)
at
com.facebook.presto.execution.scheduler.SectionExecutionFactory.createStreamingLinkedStageExecutions(SectionExecutionFactory.java:221)
at
com.facebook.presto.execution.scheduler.SectionExecutionFactory.createSectionExecutions(SectionExecutionFactory.java:167)
at
com.facebook.presto.execution.scheduler.LegacySqlQueryScheduler.createStageExecutions(LegacySqlQueryScheduler.java:343)
at
com.facebook.presto.execution.scheduler.LegacySqlQueryScheduler.<init>(LegacySqlQueryScheduler.java
:233) at
com.facebook.presto.execution.scheduler.LegacySqlQueryScheduler.createSqlQueryScheduler(LegacySqlQueryScheduler.java:164)
at
com.facebook.presto.execution.SqlQueryExecution.planDistribution(SqlQueryExecution.java:508)
at
com.facebook.presto.execution.SqlQueryExecution.start(SqlQueryExecution.java:381)
at
com.facebook.presto.$gen.Presto_0_245_1_amzn_0____20211208_121047_1.run(Unknown
Source) at
com.facebook.presto.execution.SqlQueryManager.createQuery(SqlQueryManager.java:254)
at
com.facebook.presto.dispatcher.LocalDispatchQuery.lambda$startExecution$5(LocalDispatchQuery.java:114)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
... 1 more
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]