[ 
https://issues.apache.org/jira/browse/BEAM-14161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pablo Estrada updated BEAM-14161:
---------------------------------
    Description: 
Now, the JDBC IO is basically a {{DoFn}} executed with a {{{}ParDo{}}}. So, it 
means that parallelism is "limited" and executed on one executor. 
ReadWithPartitions does some preliminary partitioning of the data, but any skew 
in data range or workload will create an unbalanced workload.

 

  was:
Now, the JDBC IO is basically a {{DoFn}} executed with a {{ParDo}}. So, it 
means that parallelism is "limited" and executed on one executor.
We can imagine to create several JDBC {{BoundedSource}}s splitting the SQL 
query in  subset (for instance using row id paging or any "splitting/limit" we 
can figure based on the original SQL query) (something similar to what Sqoop is 
doing).


> Add dynamic splitting to JdbcIO.readWithPartitions
> --------------------------------------------------
>
>                 Key: BEAM-14161
>                 URL: https://issues.apache.org/jira/browse/BEAM-14161
>             Project: Beam
>          Issue Type: Improvement
>          Components: io-java-jdbc
>            Reporter: Pablo Estrada
>            Assignee: Jean-Baptiste Onofré
>            Priority: P2
>             Fix For: Not applicable
>
>
> Now, the JDBC IO is basically a {{DoFn}} executed with a {{{}ParDo{}}}. So, 
> it means that parallelism is "limited" and executed on one executor. 
> ReadWithPartitions does some preliminary partitioning of the data, but any 
> skew in data range or workload will create an unbalanced workload.
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to