dataproblems commented on issue #12116:
URL: https://github.com/apache/hudi/issues/12116#issuecomment-2460919696

   Followed up with @ad1happy2go in Hudi Office hours and got more things to 
try: 
   
   ### Follow Up on Random Data: Use sort mode as `None` with timeline server 
enabled:
   For this follow up action item, I noticed that the data was written to s3 
but the job got stuck. I used the 37 GB dataset generated with the random data 
generation script I posted earlier. 
   
   ### Follow Up on Random Data: Use sort mode as `None` and increase the `` 
from 10 (default) to 10000
   
   For this experiment, I used the following config: 
   
   ```
   val bulkWriteOptions: Map[String, String] = Map(
     DataSourceWriteOptions.OPERATION.key() -> 
DataSourceWriteOptions.BULK_INSERT_OPERATION_OPT_VAL,
     DataSourceWriteOptions.TABLE_TYPE.key() -> 
DataSourceWriteOptions.COW_TABLE_TYPE_OPT_VAL,
     HoodieStorageConfig.PARQUET_COMPRESSION_CODEC_NAME.key() -> "snappy",
     HoodieStorageConfig.PARQUET_MAX_FILE_SIZE.key() -> "2147483648",
     "hoodie.parquet.small.file.limit" -> "1073741824",
     HoodieTableConfig.POPULATE_META_FIELDS.key() -> "true",
     HoodieWriteConfig.BULK_INSERT_SORT_MODE.key() -> 
BulkInsertSortMode.NONE.name(),
     HoodieMetadataConfig.ENABLE_METADATA_INDEX_COLUMN_STATS.key() -> "true",
     HoodieIndexConfig.INDEX_TYPE.key() -> "RECORD_INDEX",
     DataSourceWriteOptions.META_SYNC_ENABLED.key() -> "false",
     "hoodie.metadata.record.index.enable" -> "true",
     "hoodie.metadata.enable" -> "true",
     "hoodie.datasource.write.hive_style_partitioning" -> "true",
     "hoodie.clustering.inline" -> "true",
     "hoodie.clustering.plan.strategy.target.file.max.bytes" -> "2147483648",
     "hoodie.clustering.plan.strategy.small.file.limit" -> "1073741824",
     "hoodie.datasource.write.partitionpath.field" -> "partition",
     "hoodie.datasource.write.recordkey.field" -> "id",
     "hoodie.datasource.write.precombine.field" -> "ts",
     "hoodie.table.name" -> tableName,
     DataSourceWriteOptions.KEYGENERATOR_CLASS_NAME.key() -> 
classOf[SimpleKeyGenerator].getName,
     "hoodie.write.markers.type" -> "DIRECT",
     "hoodie.embed.timeline.server" -> "true",
     "hoodie.metadata.record.index.min.filegroup.count" -> "10000",
     "hoodie.metadata.record.index.max.filegroup.count" -> "100000"
   )
   ```
   I also tried with `"hoodie.metadata.record.index.min.filegroup.count" -> 
"1000"` and got the same outcome. 
   
   Here are the screenshots from the Spark UI. 
   
   #### Stage View
   ![Spark UI Stage 
View](https://github.com/user-attachments/assets/396e0eba-c898-4c6b-a2b3-bc08922038d3)
   
   #### Stage Detail View
   ![Spark UI Stage Detail View 
1](https://github.com/user-attachments/assets/328d10e5-3b3c-49ca-b235-d5ba92cf66b2)
   
   #### Metrics for the completed tasks
   ![Spark UI Stage Detail View Executor 
Metrics](https://github.com/user-attachments/assets/e8dda4d6-01cb-4224-8342-f75f2b25f48b)
   
   #### Event Timeline 
   ![Spark UI Stage Detail View Event 
Timeline](https://github.com/user-attachments/assets/bf46585d-c084-40d2-9554-891d9a45adfa)
   
   #### Executor Summary Tab. 
   ![Screenshot 2024-11-06 at 2 11 39 
PM](https://github.com/user-attachments/assets/e3c75ca2-1a8e-4800-b8bc-df7582d9e0ab)
   
   Here are the 
[random_exp_executor_stderr.log](https://github.com/user-attachments/files/17653720/random_exp_executor_stderr.log)
 and 
   
[random_exp_executor_stdout.log](https://github.com/user-attachments/files/17653723/random_exp_executor_stdout.log)
    from one of the executors with high GC time. 
   
   I see the executor heart beat timeouts here as well. Do you have any ideas 
as to why we might be running into this? 
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to