Github user kapustor commented on a diff in the pull request:
https://github.com/apache/incubator-hawq/pull/1353#discussion_r214508994
--- Diff:
pxf/pxf-jdbc/src/main/java/org/apache/hawq/pxf/plugins/jdbc/writercallable/WriterCallableFactory.java
---
@@ -75,16 +75,15 @@ public void setQuery(String query) {
/**
* Set batch size to use.
*
- * @param batchSize < 0: Use batches of infinite size
+ * @param batchSize = 0: Use batches of recommended size
* @param batchSize = 1: Do not use batches
* @param batchSize > 1: Use batches of the given size
+ * @param batchSize < 0: Use batches of infinite size
*/
public void setBatchSize(int batchSize) {
- if (batchSize < 0) {
- batchSize = 0;
- }
- else if (batchSize == 0) {
- batchSize = 1;
+ if (batchSize == 0) {
+ // Set the recommended value:
https://docs.oracle.com/cd/E11882_01/java.112/e16548/oraperf.htm#JJDBC28754
+ batchSize = 100;
--- End diff --
As I understand, now optimal parameter values look like this:
```
1. nothing (not provided) -- use default of 100
2. 0 -- no batching
3. 1 -- no batching
4. >1 -- use the number provided by the user
```
In my opinion, we shouldn't limit possible `BATCH_SIZE` max value (or we
should set it very high, to 500k for example) as we dont know the usecases and
external DBs that users will use it with. It is possible that there are some
DBs or other systems that use high batch size for optimal inserts.
@denalex what do you think about `BATCH_SIZE` max value?
---