[
https://issues.apache.org/jira/browse/BEAM-3292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Anton Kedin closed BEAM-3292.
-----------------------------
[https://github.com/apache/beam/pull/4488]
> Remove BeamRecordSqlType
> ------------------------
>
> Key: BEAM-3292
> URL: https://issues.apache.org/jira/browse/BEAM-3292
> Project: Beam
> Issue Type: Bug
> Components: dsl-sql
> Reporter: Anton Kedin
> Assignee: Anton Kedin
> Priority: Major
> Fix For: Not applicable
>
>
> [BeamRecordType|https://github.com/apache/beam/blob/39e66e953b0f8e16435acb038cad364acf2b3a57/sdks/java/core/src/main/java/org/apache/beam/sdk/values/BeamRecordType.java]
> is implemented as 2 lists: the list of field names, and the list of the
> coders for those fields. Both lists are ordered.
> [BeamRecordSqlType|https://github.com/apache/beam/blob/2eb7de0fe6e96da9805fc827294da1e1329ff716/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/BeamRecordSqlType.java]
> additionally has a list of
> [java.sql.Types|https://docs.oracle.com/javase/7/docs/api/java/sql/Types.html]
> ints to define types of those fields. It is used to map between Java types,
> Calcite types, and Beam Coders.
> This information is not used for anything except for that mapping, which in
> turn is only used to create records and map back to Calcite types.
> But because of this indirect mapping we cannot rely on core BeamRecordType
> and are forced to have BeamRecordSqlType. This introduces additional
> complexity, when, for example, generating record types based on pojo classes.
> If we could find another mechanism to map Calcite types and java classes to
> Beam Coders bypassing java.sql.Types then we can just use the core
> BeamRecordType and remove the BeamRecordSqlType functionality.
> One approach is to have a predefined set of coders which are then used like
> types, e.g.:
> {code:java}
> public static class SqlCoders {
> public Coder INTEGER = VarIntCoder.of();
> public Coder VARCHAR = StringUtf8COder.of();
> public Coder TIMESTAMP = DateCoder.of();
> }
> {code}
> Problem with that approach is establishing the coders identity. That is, when
> a coder is serialized and then deserialized, it becomes a different instance,
> so we need a mechanism to know the identity or maybe just equality of the
> coders. If this is solved then replacing java.sql.Types with predefined SQL
> coders like above becomes trivial.
> Few links on this:
> -
> https://github.com/apache/beam/blob/master/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/CoderTranslation.java#L56
> -
> https://github.com/apache/beam/blob/master/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/CoderTranslator.java#L34
> -
> https://github.com/apache/beam/blob/master/model/pipeline/src/main/proto/beam_runner_api.proto#L391
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)