kyehe opened a new pull request, #3734:
URL: https://github.com/apache/incubator-seatunnel/pull/3734

   **Tested Job For MySQL2Hive**
   
   - mysql source: `2.4kw` rows
   - env config (what we need is `Faster Speed With Fewer Resources`)
   env {
     spark.app.name = "SeaTunnel Spark Job"
     spark.dynamicAllocation.enabled =  false
     spark.executor.instances = 1
     spark.executor.cores = 1
     spark.executor.memory = "2g"
     spark.driver.memory = "1g"
     spark.dynamicAllocation.minExecutors = 1
     spark.executor.memoryOverhead = 1g
     spark.executor.heartbeatInterval = 60s
   }
   
   **Optimized Before: Job Runs 15min**
   
   
![image](https://user-images.githubusercontent.com/31163620/207883519-bb8b8a90-f098-4178-bd21-87fd7c05a647.png)
   
   
![image](https://user-images.githubusercontent.com/31163620/207883576-e8750ec5-951a-40b7-b0ed-ff72a4e8f3a3.png)
   
   we can see the sink writer is too slower than source reader...
   
   
![image](https://user-images.githubusercontent.com/31163620/207883769-7faeb059-6f04-4be4-aa0a-65e552899281.png)
   
   **Optimized After: Job Runs 3min**
   
   
![image](https://user-images.githubusercontent.com/31163620/207884321-43572d59-8233-4c3e-b730-c39b416fbfb0.png)
   
   
![image](https://user-images.githubusercontent.com/31163620/207885064-667574f5-50c2-4544-aacd-0e9b01143c69.png)
   
   since we used **_batch-rows insert_** rather than one-row insert and flush 
the temp file once it was written done.
   
   we can see the sink writer can not only **speed up consume the source 
data**, but it also can **avoid executor task  failed because of 
out-of-memory** if the job resource conf is not enough (for tested job: we just 
set 2g of executor and 1g of driver).
   
   
![image](https://user-images.githubusercontent.com/31163620/207885086-4cb39d9c-4da1-4cee-b2c2-eb18147d9c89.png)
   
   
   <!--
   
   Thank you for contributing to SeaTunnel! Please make sure that your code 
changes
   are covered with tests. And in case of new features or big changes
   remember to adjust the documentation.
   
   Feel free to ping committers for the review!
   
   ## Contribution Checklist
   
     - Make sure that the pull request corresponds to a [GITHUB 
issue](https://github.com/apache/incubator-seatunnel/issues).
   
     - Name the pull request in the form "[Feature] [component] Title of the 
pull request", where *Feature* can be replaced by `Hotfix`, `Bug`, etc.
   
     - Minor fixes should be named following this pattern: `[hotfix] [docs] Fix 
typo in README.md doc`.
   
   -->
   
   ## Purpose of this pull request
   
   <!-- Describe the purpose of this pull request. For example: This pull 
request adds checkstyle plugin.-->
   
   ## Check list
   
   * [ ] Code changed are covered with tests, or it does not need tests for 
reason:
   * [ ] If any new Jar binary package adding in your PR, please add License 
Notice according
     [New License 
Guide](https://github.com/apache/incubator-seatunnel/blob/dev/docs/en/contribution/new-license.md)
   * [ ] If necessary, please update the documentation to describe the new 
feature. https://github.com/apache/incubator-seatunnel/tree/dev/docs
   * [ ] If you are contributing the connector code, please check that the 
following files are updated:
     1. Update change log that in connector document. For more details you can 
refer to 
[connector-v2](https://github.com/apache/incubator-seatunnel/tree/dev/docs/en/connector-v2)
     2. Update 
[plugin-mapping.properties](https://github.com/apache/incubator-seatunnel/blob/dev/plugin-mapping.properties)
 and add new connector information in it
     3. Update the pom file of 
[seatunnel-dist](https://github.com/apache/incubator-seatunnel/blob/dev/seatunnel-dist/pom.xml)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to