joshhamann commented on issue #10822: URL: https://github.com/apache/hudi/issues/10822#issuecomment-1983870364
You can see the timestamps in the above screenshots from the Spark UI if that works. For instance, the test job, which is processing more data, goes from around 23:18 to 23:23 (and the majority of which is actually processing the data from my understanding). The production job, goes from 21:42 to 21:55, and the processing portion of the job is similar to the test job. The only difference is the disparate steps at the end taking more time. Some additional context which may or may not be helpful, our job is reading off events, and in theory, should only be processing events within that particular UTC date. However, we do get events with past and future timestamps (outside of the particular UTC partition date, which means there are more partitions to upsert into. However, the test job is pointing to the same raw data, so I guess I would expect that to be happening in both places. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
