rajgowtham24 opened a new issue #4411:
URL: https://github.com/apache/hudi/issues/4411


   Hi team,
   
   We are using Hudi 0.5.0, Presto 0.230 in emr 6.0.0 version and we have used 
the below sample code for data ingestion. During the upgrade, we are facing 
error with Presto.
   
   Below code executed in 0.5.0. Post the data ingestion we have manually 
updated the table location to **"s3://bucket_name/test/table_name/default"** 
and we are able to query the data in Presto 0.230 as well. 
   
   We are in progress of upgrading the emr version to 6.3, during the POC we 
have observed that the table ingested through emr 6.0 (Hudi 0.5.0, Presto 
0.230) is not working in Presto query engine emr 6.3 (Presto 0.245.1 & Hudi 
0.7.0). provided the error below. Could you please take a look at the below and 
let me know
   
   Code Executed in 0.5.0
   inputDF = spark.createDataFrame(
       [
           ("100","Apache","Spark", "2015/01/01", 
"2015-01-01T13:51:39.340396Z","2015-01-02"),
           ("101","Apache","Beam","2015/01/01", 
"2015-01-01T12:14:58.597216Z","2015-01-02"),
           ("102","Apache","Hudi","2015/01/01", 
"2015-01-01T13:51:40.417052Z","2015-01-02"),
           ("103","GCP","DataProc","2015/01/01", 
"2015-01-01T13:51:40.519832Z","2015-01-02"),
           ("104","AWS","Glue","2015/01/02", 
"2015-01-01T12:15:00.512679Z","2015-01-03"),
           ("105","AWS","EMR","2015/01/02", 
"2015-01-01T13:51:42.248818Z","2015-01-03")
       ],
       ["id","provider","technology", "created_date", 
"last_update_time","load_date"]
   )
   
   hive_jdbcurl = "jdbc:hive2://localhost:10000"
   hive_username = "hive"
   hive_password = "xxx"
   hudi_table_name="table_name"
   landing_bucket = "s3://bucket_name/test"
   hive_schema = "test"
   
   
inputDF.write.format("org.apache.hudi").option("hoodie.datasource.write.recordkey.field","id,provider").option("hoodie.datasource.write.precombine.field","last_update_time").option("hoodie.table.name",hudi_table_name).option("hoodie.datasource.write.storage.type","MERGE_ON_READ").option("hoodie.datasource.hive_sync.enable","true").option("hoodie.datasource.hive_sync.database",hive_schema).option("hoodie.datasource.hive_sync.table",hudi_table_name).option("hoodie.datasource.hive_sync.jdbcurl",hive_jdbcurl).option("hoodie.datasource.hive_sync.username",hive_username).option("hoodie.datasource.hive_sync.password",hive_password).option("hoodie.datasource.hive_sync.assume_date_partitioning","false").option("hoodie.datasource.hive_sync.partition_extractor_class","org.apache.hudi.hive.NonPartitionedExtractor").option("hoodie.datasource.write.keygenerator.class","org.apache.hudi.ComplexKeyGenerator").option("hoodie.compact.inline","true").option("hoodie.bulkinsert.shuffle.parallelism","10
 0").option("hoodie.insert.shuffle.parallelism", 
"2").option("hoodie.upsert.shuffle.parallelism", 
"2").option("hoodie.datasource.write.operation","bulk_insert").mode("Overwrite").save(landing_bucket+hudi_table_name)
   
   **Presto Error**
   Hoodie table not found in path 
s3://bucket_name/test/table_name/default/.hoodie at 
com.facebook.presto.jdbc.PrestoResultSet.resultsException(PrestoResultSet.java:1841)
 at 
com.facebook.presto.jdbc.PrestoResultSet$ResultsPageIterator.computeNext(PrestoResultSet.java:1821)
 at 
com.facebook.presto.jdbc.PrestoResultSet$ResultsPageIterator.computeNext(PrestoResultSet.java:1760)
 at 
com.facebook.presto.jdbc.internal.guava.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:141)
 at 
com.facebook.presto.jdbc.internal.guava.collect.AbstractIterator.hasNext(AbstractIterator.java:136)
 at 
com.facebook.presto.jdbc.internal.guava.collect.TransformedIterator.hasNext(TransformedIterator.java:42)
 at 
com.facebook.presto.jdbc.internal.guava.collect.Iterators$ConcatenatedIterator.getTopMetaIterator(Iterators.java:1311)
 at 
com.facebook.presto.jdbc.internal.guava.collect.Iterators$ConcatenatedIterator.hasNext(Iterators.java:1327)
 at com.facebook.presto.jdbc.PrestoResultSet.next(PrestoResultSet.
 java:146) at sun.reflect.GeneratedMethodAccessor1.invoke(Unknown Source) at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:498) at 
py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231) at 
py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:381) at 
py4j.Gateway.invoke(Gateway.java:259) at 
py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133) at 
py4j.commands.CallCommand.execute(CallCommand.java:79) at 
py4j.GatewayConnection.run(GatewayConnection.java:209) at 
java.lang.Thread.run(Thread.java:748) Caused by: 
org.apache.hudi.exception.TableNotFoundException: Hoodie table not found in 
path 
s3://az-eu-azodh-opsit-s3-format-dev/f_poc/63_hudi_write_complex/default/.hoodie
 at 
org.apache.hudi.exception.TableNotFoundException.checkTableValidity(TableNotFoundException.java:53)
 at 
org.apache.hudi.common.table.HoodieTableMetaClient.<init>(HoodieTableMetaClient.java:128)
 at org.apa
 
che.hudi.common.table.HoodieTableMetaClient.<init>(HoodieTableMetaClient.java:114)
 at 
org.apache.hudi.common.table.HoodieTableMetaClient.<init>(HoodieTableMetaClient.java:100)
 at 
com.facebook.presto.hive.HudiDirectoryLister.<init>(HudiDirectoryLister.java:58)
 at 
com.facebook.presto.hive.StoragePartitionLoader.<init>(StoragePartitionLoader.java:138)
 at 
com.facebook.presto.hive.DelegatingPartitionLoader.<init>(DelegatingPartitionLoader.java:56)
 at 
com.facebook.presto.hive.BackgroundHiveSplitLoader.<init>(BackgroundHiveSplitLoader.java:112)
 at 
com.facebook.presto.hive.HiveSplitManager.getSplits(HiveSplitManager.java:298) 
at 
com.facebook.presto.spi.connector.classloader.ClassLoaderSafeConnectorSplitManager.getSplits(ClassLoaderSafeConnectorSplitManager.java:62)
 at com.facebook.presto.split.SplitManager.getSplits(SplitManager.java:90) at 
com.facebook.presto.split.CloseableSplitSourceProvider.getSplits(CloseableSplitSourceProvider.java:53)
 at com.facebook.presto.sql.planner.SplitSourceFac
 tory$Visitor.lambda$visitScanAndFilter$1(SplitSourceFactory.java:192) at 
com.facebook.presto.sql.planner.LazySplitSource.getDelegate(LazySplitSource.java:96)
 at 
com.facebook.presto.sql.planner.LazySplitSource.getConnectorId(LazySplitSource.java:48)
 at 
com.facebook.presto.execution.scheduler.SectionExecutionFactory.createStageScheduler(SectionExecutionFactory.java:281)
 at 
com.facebook.presto.execution.scheduler.SectionExecutionFactory.createStreamingLinkedStageExecutions(SectionExecutionFactory.java:243)
 at 
com.facebook.presto.execution.scheduler.SectionExecutionFactory.createStreamingLinkedStageExecutions(SectionExecutionFactory.java:221)
 at 
com.facebook.presto.execution.scheduler.SectionExecutionFactory.createSectionExecutions(SectionExecutionFactory.java:167)
 at 
com.facebook.presto.execution.scheduler.LegacySqlQueryScheduler.createStageExecutions(LegacySqlQueryScheduler.java:343)
 at 
com.facebook.presto.execution.scheduler.LegacySqlQueryScheduler.<init>(LegacySqlQueryScheduler.java
 :233) at 
com.facebook.presto.execution.scheduler.LegacySqlQueryScheduler.createSqlQueryScheduler(LegacySqlQueryScheduler.java:164)
 at 
com.facebook.presto.execution.SqlQueryExecution.planDistribution(SqlQueryExecution.java:508)
 at 
com.facebook.presto.execution.SqlQueryExecution.start(SqlQueryExecution.java:381)
 at 
com.facebook.presto.$gen.Presto_0_245_1_amzn_0____20211208_121047_1.run(Unknown 
Source) at 
com.facebook.presto.execution.SqlQueryManager.createQuery(SqlQueryManager.java:254)
 at 
com.facebook.presto.dispatcher.LocalDispatchQuery.lambda$startExecution$5(LocalDispatchQuery.java:114)
 at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
... 1 more
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to