hackergin opened a new issue, #7905:
URL: https://github.com/apache/paimon/issues/7905

   ### Search before asking
   
   - [x] I searched in the [issues](https://github.com/apache/paimon/issues) 
and found nothing similar.
   
   
   ### Paimon version
   
   1.3.1
   
   ### Compute Engine
   
   Flink
   
   ### Minimal reproduce step
   
    -- Flink 1.19+
     SET 'parallelism.default' = '4';
   
     CREATE TABLE t (
       id BIGINT,
       v  STRING
     ) WITH (
       'connector'        = 'paimon',
       'bucket'           = '16',
       'scan.parallelism' = '16'
     );
   
     EXPLAIN PLAN FOR
     SELECT id, UPPER(v) FROM t;
   
     Inspect the resulting job graph in the Flink Web UI / EXPLAIN.
   
   ### What doesn't meet your expectations?
   
   Expected
   
     The scan.parallelism option, by its name and by Flink's standard Table 
connector contract, is supposed to control only the source operator's 
parallelism. Downstream operators should
      run at parallelism.default, with a rebalance edge inserted between them 
and the source:
   
     Source[t] (parallelism=16)  --rebalance-->  Calc(UPPER(v)) (parallelism=4)
   
     Actual
   
     The configured scan.parallelism = 16 propagates to every downstream 
operator in the forward chain. The whole pipeline runs at parallelism 16, 
ignoring parallelism.default:
   
     Source[t] (parallelism=16)  --forward-->  Calc(UPPER(v)) (parallelism=16)
   
     There is no way through SQL today to read a Paimon table with high source 
parallelism while keeping downstream computation at the global default.
   
   ### Anything else?
   
   Root cause
   
     PaimonDataStreamScanProvider already inherits the ParallelismProvider 
interface (Flink's DataStreamScanProvider extends ParallelismProvider), but it 
never overrides
     getParallelism(), so the planner always receives the default 
Optional.empty(). Paimon then applies the configured parallelism by mutating 
the produced DataStream directly
   
   
   References
   
     - Flink — [Table connector 
ParallelismProvider](https://nightlies.apache.org/flink/flink-docs-stable/docs/dev/table/sourcessinks/#parallelism)
     - [FLIP-367: Support Setting Parallelism for Table/SQL 
Sources](https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=263429150)
   
   ### Are you willing to submit a PR?
   
   - [x] I'm willing to submit a PR!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to