michael1991 opened a new issue, #8100: URL: https://github.com/apache/hudi/issues/8100
**Describe the problem you faced** We have an hourly custom Spark job which incrementally append a MOR table with async cleaning and compaction. But we faced two problems here: 1. Job execution time is unstable (9min ~ 34min). 2. Many RequestHandler WARN logs appear like below: ``` 23/03/06 08:29:27 WARN RequestHandler: Bad request response due to client view behind server view. Last known instant from client was 20230306081811007 but server has the following timeline [[20230306053715487__deltacommit__COMPLETED], [20230306053909170__deltacommit__COMPLETED], [20230306054016158__deltacommit__COMPLETED], [20230306054048527__commit__COMPLETED], [20230306054633503__deltacommit__COMPLETED], [20230306054829613__deltacommit__COMPLETED], [20230306055410708__deltacommit__COMPLETED], [20230306055412958__clean__COMPLETED], [20230306055651288__deltacommit__COMPLETED], [20230306060351975__deltacommit__COMPLETED], [20230306060355539__clean__COMPLETED], [20230306061211058__deltacommit__COMPLETED], [20230306061726347__commit__COMPLETED], [20230306062444656__deltacommit__COMPLETED], [20230306062859872__deltacommit__COMPLETED], [20230306063610912__deltacommit__COMPLETED], [20230306063613508__clean__COMPLETED], [20230306065356198__deltacommit__COMPLETED], [20230306070057884__de ltacommit__COMPLETED], [20230306070100336__clean__COMPLETED], [20230306072323088__deltacommit__COMPLETED], [20230306072824814__commit__COMPLETED], [20230306072853022__clean__COMPLETED], [20230306073557240__deltacommit__COMPLETED], [20230306073841564__deltacommit__COMPLETED], [20230306074501873__deltacommit__COMPLETED], [20230306074504461__clean__COMPLETED], [20230306075516906__deltacommit__COMPLETED], [20230306080130924__deltacommit__COMPLETED], [20230306080133247__clean__COMPLETED], [20230306081331748__deltacommit__COMPLETED], [20230306081811007__commit__COMPLETED], [20230306081838580__clean__COMPLETED], [20230306082752807__clean__COMPLETED]] ``` **Expected behavior** Under same Spark cluster scale, job execution time should be correlated with data size. **Environment Description** * Hudi version : 0.12.0 * Spark version : 3.3.0 * Hive version : not used * Hadoop version : 3.3.3 * Storage (HDFS/S3/GCS..) : GCS * Running on Docker? (yes/no) : no -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
