[jira] [Work logged] (BEAM-6674) The JdbcIO source should produce schemas

ASF GitHub Bot (JIRA) Mon, 03 Jun 2019 05:57:36 -0700


     [ 
https://issues.apache.org/jira/browse/BEAM-6674?focusedWorklogId=253090&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-253090
 ]


ASF GitHub Bot logged work on BEAM-6674:
----------------------------------------

                Author: ASF GitHub Bot
            Created on: 03/Jun/19 12:56
            Start Date: 03/Jun/19 12:56
    Worklog Time Spent: 10m 
      Work Description: charithe commented on pull request #8725: [BEAM-6674] 
Add schema support to JdbcIO read
URL: https://github.com/apache/beam/pull/8725#discussion_r289830147
 
 

 ##########
 File path: 
sdks/java/io/jdbc/src/main/java/org/apache/beam/sdk/io/jdbc/JdbcIO.java
 ##########
 @@ -188,6 +191,15 @@
         .build();
   }
 
+  /** Read Beam {@link Row}s from a JDBC data source. */
+  @Experimental(Experimental.Kind.SCHEMAS)
+  public static ReadRows readRows() {
+    return new AutoValue_JdbcIO_ReadRows.Builder()
+        .setFetchSize(DEFAULT_FETCH_SIZE)
+        .setOutputParallelization(true)
+        .build();
+  }
 
 Review comment:
   I looked into this a little bit more and I think I may have misunderstood a 
few things.
   
   I had the idea that if I get the `TypeDescriptor` of the PCollection, I 
could query the schema registry to get the necessary information to attach the 
schema. However, this doesn't seem to work in practice. Of course, I could be 
doing something wrong there but I came to the realisation that if it was indeed 
a viable solution, then it would have been already implemented at the core as 
it would provide a generic way to implicitly attach schemas to any PCollection 
that contains suitable element types.
   
   Now I am not quite sure what benefit a `readWithSchema(clazz)` method would 
have. If the user does need schema support, they could just as well call 
`setSchema` on the PCollection themselves, can't they? We can reduce the 
boilerplate involved but that is not specific to the JDBC IO module and should 
probably go in to the core schema module as a utility function.
   
   Since the user is responsible for defining how to map the ResultSet to a 
type of their choosing, they could potentially do some field transforms during 
the mapping and produce an output type whose schema doesn't match up with the 
schema inferred from the raw data. Therefore I feel like comparing the two 
schemas will not be very useful either.
   
   What are your thoughts?
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


Issue Time Tracking
-------------------

    Worklog Id:     (was: 253090)
    Time Spent: 1h 50m  (was: 1h 40m)

> The JdbcIO source should produce schemas
> ----------------------------------------
>
>                 Key: BEAM-6674
>                 URL: https://issues.apache.org/jira/browse/BEAM-6674
>             Project: Beam
>          Issue Type: Sub-task
>          Components: io-java-jdbc
>            Reporter: Reuven Lax
>            Assignee: Shehzaad Nakhoda
>            Priority: Major
>          Time Spent: 1h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Work logged] (BEAM-6674) The JdbcIO source should produce schemas

Reply via email to