[GitHub] [hudi] ZeMirella commented on issue #3699: [SUPPORT] Job hanging on toRdd at HoodieSparkUtils

GitBox Wed, 22 Sep 2021 12:00:12 -0700


ZeMirella commented on issue #3699:
URL: https://github.com/apache/hudi/issues/3699#issuecomment-925219008



   Hi, thanks for you reply
   **Which line of code from HoodieSparkUtils was ran here?**
   The jobs hangs before even start, it hangs when it start to list files and 
tries to read s3 files.
   the hanged task that the spark history shows me is this one
   <img width="958" alt="Captura de Tela 2021-09-22 às 15 50 25" 
src="https://user-images.githubusercontent.com/75490501/134405074-b8cde70b-d81d-4299-b4a6-05cceb538386.png";>
   
   **What Hudi actions are you trying to perform?**
   This job was suppose to join some tables and save the output to s3, the code 
line where it hangs ia an create table operation, here is code line 
   `        hudi_options = {
               'hoodie.table.name': self.table_name,
               'hoodie.datasource.write.recordkey.field': self.primary_key,
               'hoodie.datasource.write.table.name': self.table_name,
               'hoodie.datasource.write.operation': 'bulk_insert',
               'hoodie.bulkinsert.shuffle.parallelism': 
self.bulk_insert_shuffle_parallelism,
               'hoodie.datasource.hive_sync.enable': self.hive_sync_enabled,
               'hoodie.datasource.hive_sync.database': self.hive_database_name,
               'hoodie.datasource.hive_sync.jdbcurl': 
f'jdbc:hive2://{self.hive_jdbc_url}:10000',
               'hoodie.datasource.hive_sync.table': self.table_name,
               'hoodie.datasource.hive_sync.partition_extractor_class': 
'org.apache.hudi.hive.NonPartitionedExtractor',
               'hoodie.datasource.hive_sync.support_timestamp': 'true',
               'hoodie.datasource.write.keygenerator.class': 
'org.apache.hudi.keygen.NonpartitionedKeyGenerator',
               'hoodie.datasource.write.row.writer.enable': 'false',
               'hoodie.parquet.small.file.limit': 536870912,
               'hoodie.parquet.max.file.size': 1073741824,
               'hoodie.parquet.block.size': 536870912
           }
   
   
spark_df.write.format("hudi").options(**hudi_options).mode("overwrite").save(self.table_path)`
    
   **What is the total input data size are you reading?**
   1,6TB
   
   **How many executors were actually created during the run?**
   37
   <img width="1745" alt="image" 
src="https://user-images.githubusercontent.com/75490501/134403621-c4ca12e1-93fa-405a-910a-595013062343.png";>
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] ZeMirella commented on issue #3699: [SUPPORT] Job hanging on toRdd at HoodieSparkUtils

Reply via email to