davidzollo opened a new issue, #10596:
URL: https://github.com/apache/seatunnel/issues/10596

   ### Search before asking
   
   - [x] I had searched in the 
[issues](https://github.com/apache/seatunnel/issues) and found no similar 
feature requirement.
   
   ### Description
   
   Currently, when the data distribution factor falls outside the range 
`[even-distribution.factor.lower-bound, even-distribution.factor.upper-bound]` 
(default `[0.05, 100.0]`) and the estimated shard count exceeds 
`sample-sharding.threshold` (default 1000), sampling-based sharding is 
automatically triggered via `efficientShardingThroughSampling()`.
   
   However, in some production scenarios, users may want to **explicitly 
disable** sampling-based sharding, for example:
   - The source database has strict resource constraints, and the sampling 
query (which may scan all rows) causes unacceptable load.
   - Users prefer deterministic splitting behavior without sampling overhead.
   - The sampling result may not accurately reflect data distribution due to 
data skew patterns.
   
   ### Proposal
   
   Add a boolean configuration option to allow users to disable sampling:
   
   **For CDC connectors (`connector-cdc-base`):**
   - Option: `sample-sharding.enable`
   - Default: `true` (backward compatible)
   
   **For JDBC connector (`connector-jdbc`):**
   - Option: `split.sample-sharding.enable`
   - Default: `true` (backward compatible)
   
   When set to `false`, the system should fall back to unevenly-sized chunk 
splitting (iterative query approach) regardless of the shard count.
   
   ### Related code
   
   - `AbstractJdbcSourceChunkSplitter.evenlyColumnSplitChunks()` 
(connector-cdc-base)
   - `DynamicChunkSplitter.evenlyColumnSplitChunks()` (connector-jdbc)
   - `JdbcSourceOptions.SAMPLE_SHARDING_THRESHOLD` (both modules)
   
   ### Use Cases
   
   1. Production databases with strict query resource limits
   2. Scenarios where deterministic split behavior is preferred
   3. Cases where sampling produces suboptimal chunk boundaries
   
   ### Are you willing to submit a PR?
   
   - [x] Yes I am willing to submit a PR!
   
   ### Version
   
   - 2.3.x / 2.4.x / 2.5.x / 2.6.x


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to