[ 
https://issues.apache.org/jira/browse/BEAM-3292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anton Kedin closed BEAM-3292.
-----------------------------

[https://github.com/apache/beam/pull/4488] 

> Remove BeamRecordSqlType
> ------------------------
>
>                 Key: BEAM-3292
>                 URL: https://issues.apache.org/jira/browse/BEAM-3292
>             Project: Beam
>          Issue Type: Bug
>          Components: dsl-sql
>            Reporter: Anton Kedin
>            Assignee: Anton Kedin
>            Priority: Major
>             Fix For: Not applicable
>
>
> [BeamRecordType|https://github.com/apache/beam/blob/39e66e953b0f8e16435acb038cad364acf2b3a57/sdks/java/core/src/main/java/org/apache/beam/sdk/values/BeamRecordType.java]
>  is implemented as 2 lists: the list of field names, and the list of the 
> coders for those fields. Both lists are ordered.
> [BeamRecordSqlType|https://github.com/apache/beam/blob/2eb7de0fe6e96da9805fc827294da1e1329ff716/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/BeamRecordSqlType.java]
>  additionally has a list of 
> [java.sql.Types|https://docs.oracle.com/javase/7/docs/api/java/sql/Types.html]
>  ints to define types of those fields. It is used to map between Java types, 
> Calcite types, and Beam Coders.
> This information is not used for anything except for that mapping, which in 
> turn is only used to create records and map back to Calcite types.
> But because of this indirect mapping we cannot rely on core BeamRecordType 
> and are forced to have BeamRecordSqlType. This introduces additional 
> complexity, when, for example, generating record types based on pojo classes.
> If we could find another mechanism to map Calcite types and java classes to 
> Beam Coders bypassing java.sql.Types then we can just use the core 
> BeamRecordType and remove the BeamRecordSqlType functionality.
> One approach is to have a predefined set of coders which are then used like 
> types, e.g.:
> {code:java}
> public static class SqlCoders {
>    public Coder INTEGER = VarIntCoder.of();
>    public Coder VARCHAR = StringUtf8COder.of();
>    public Coder TIMESTAMP = DateCoder.of();
> }
> {code}
> Problem with that approach is establishing the coders identity. That is, when 
> a coder is serialized and then deserialized, it becomes a different instance, 
> so we need a mechanism to know the identity or maybe just equality of the 
> coders. If this is solved then replacing java.sql.Types with predefined SQL 
> coders like above becomes trivial.
> Few links on this:
>  - 
> https://github.com/apache/beam/blob/master/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/CoderTranslation.java#L56
> - 
> https://github.com/apache/beam/blob/master/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/CoderTranslator.java#L34
>  - 
> https://github.com/apache/beam/blob/master/model/pipeline/src/main/proto/beam_runner_api.proto#L391



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to