ZeMirella commented on issue #3699:
URL: https://github.com/apache/hudi/issues/3699#issuecomment-925219008


   Hi, thanks for you reply
   **Which line of code from HoodieSparkUtils was ran here?**
   The jobs hangs before even start, it hangs when it start to list files and 
tries to read s3 files.
   the hanged task that the spark history shows me is this one
   <img width="958" alt="Captura de Tela 2021-09-22 às 15 50 25" 
src="https://user-images.githubusercontent.com/75490501/134405074-b8cde70b-d81d-4299-b4a6-05cceb538386.png";>
   
   **What Hudi actions are you trying to perform?**
   This job was suppose to join some tables and save the output to s3, the code 
line where it hangs ia an create table operation, here is code line 
   `        hudi_options = {
               'hoodie.table.name': self.table_name,
               'hoodie.datasource.write.recordkey.field': self.primary_key,
               'hoodie.datasource.write.table.name': self.table_name,
               'hoodie.datasource.write.operation': 'bulk_insert',
               'hoodie.bulkinsert.shuffle.parallelism': 
self.bulk_insert_shuffle_parallelism,
               'hoodie.datasource.hive_sync.enable': self.hive_sync_enabled,
               'hoodie.datasource.hive_sync.database': self.hive_database_name,
               'hoodie.datasource.hive_sync.jdbcurl': 
f'jdbc:hive2://{self.hive_jdbc_url}:10000',
               'hoodie.datasource.hive_sync.table': self.table_name,
               'hoodie.datasource.hive_sync.partition_extractor_class': 
'org.apache.hudi.hive.NonPartitionedExtractor',
               'hoodie.datasource.hive_sync.support_timestamp': 'true',
               'hoodie.datasource.write.keygenerator.class': 
'org.apache.hudi.keygen.NonpartitionedKeyGenerator',
               'hoodie.datasource.write.row.writer.enable': 'false',
               'hoodie.parquet.small.file.limit': 536870912,
               'hoodie.parquet.max.file.size': 1073741824,
               'hoodie.parquet.block.size': 536870912
           }
   
   
spark_df.write.format("hudi").options(**hudi_options).mode("overwrite").save(self.table_path)`
    
   **What is the total input data size are you reading?**
   1,6TB
   
   **How many executors were actually created during the run?**
   37
   <img width="1745" alt="image" 
src="https://user-images.githubusercontent.com/75490501/134403621-c4ca12e1-93fa-405a-910a-595013062343.png";>
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to