[
https://issues.apache.org/jira/browse/BEAM-10832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Kenneth Knowles updated BEAM-10832:
-----------------------------------
Component/s: io-java-clickhouse
(was: beam-model)
> ClickhouseIO's getTableSchema method is called before Pipeline Starts
> ---------------------------------------------------------------------
>
> Key: BEAM-10832
> URL: https://issues.apache.org/jira/browse/BEAM-10832
> Project: Beam
> Issue Type: Improvement
> Components: io-java-clickhouse
> Affects Versions: 2.23.0
> Reporter: Vasu Gupta
> Priority: P2
> Fix For: Not applicable
>
> Time Spent: 1h 40m
> Remaining Estimate: 0h
>
> A method in ClickhouseIO called {color:#172b4d}getTableSchema() is being used
> in WriteFn's expand method which is called even before the Pipeline is
> started. The main issue is that getTableSchema() makes a connection with
> Clickhouse and if at the time of just pipeline launch, if i can't connect to
> a clickhouse-server, the pipeline won't even start. Let's suppose there is a
> clickhouse server deployed on a production server, now if i want to launch a
> DataFlow pipeline from my local then i shouldn't be requiring a working
> connection to clickhouse-server from my local environment (but i should be
> able to connect to clickhouse-server from dataflow).{color}
>
> {color:#172b4d}What i suggest:{color}
> {color:#172b4d}getTableSchema() should be a singleton method and must be
> called in setup() method (instead of PTransform's expand method) of DoFn
> since setup method is called after the pipeline is started (In my case "at
> DataFlow" not local){color}
>
> I would be more than happy to work on this improvement in Apache Beam (Java).
>
>
--
This message was sent by Atlassian Jira
(v8.20.1#820001)