[ 
https://issues.apache.org/jira/browse/BEAM-10832?focusedWorklogId=491919&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-491919
 ]

ASF GitHub Bot logged work on BEAM-10832:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 28/Sep/20 11:16
            Start Date: 28/Sep/20 11:16
    Worklog Time Spent: 10m 
      Work Description: kanterov commented on pull request #12919:
URL: https://github.com/apache/beam/pull/12919#issuecomment-699943118


   @Vasu7052 Moving table schema logic into workers will turn a class of 
deployment errors into the runtime errors, that isn't always desired. From 
reading the JIRA ticket I can understand your motivation. As a middle-ground, 
probably you can add `tableSchema` as a nullable property to 
`ClickHouseIO.Write` that is going to be populated in the 
`ClickHouseIO.Write.Builder` unless specified explicitly? This way we can 
specify `TableSchema` in the pipeline graph when fetching from ClickHouse isn't 
possible.
   
   This also enables the future check if input PCollection schema is compatible 
with table schema, that isn't implemented yet, but possible with the current 
implementation, and not going to be possible if table schema is unknown during 
deployment.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Issue Time Tracking
-------------------

    Worklog Id:     (was: 491919)
    Time Spent: 40m  (was: 0.5h)

> ClickhouseIO's getTableSchema method is called before Pipeline Starts
> ---------------------------------------------------------------------
>
>                 Key: BEAM-10832
>                 URL: https://issues.apache.org/jira/browse/BEAM-10832
>             Project: Beam
>          Issue Type: Improvement
>          Components: beam-model
>    Affects Versions: 2.23.0
>            Reporter: Vasu Gupta
>            Priority: P3
>             Fix For: Not applicable
>
>          Time Spent: 40m
>  Remaining Estimate: 0h
>
> A method in ClickhouseIO called {color:#172b4d}getTableSchema() is being used 
> in WriteFn's expand method which is called even before the Pipeline is 
> started. The main issue is that getTableSchema() makes a connection with 
> Clickhouse and if at the time of just pipeline launch, if i can't connect to 
> a clickhouse-server, the pipeline won't even start. Let's suppose there is a 
> clickhouse server deployed on a production server, now if i want to launch a 
> DataFlow pipeline from my local then i shouldn't be requiring a working 
> connection to clickhouse-server from my local environment (but i should be 
> able to connect to clickhouse-server from dataflow).{color}
>  
> {color:#172b4d}What i suggest:{color}
> {color:#172b4d}getTableSchema() should be a singleton method and must be 
> called in setup() method (instead of PTransform's expand method) of DoFn 
> since setup method is called after the pipeline is started (In my case "at 
> DataFlow" not local){color}
>  
> I would be more than happy to work on this improvement in Apache Beam (Java).
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to