[ https://issues.apache.org/jira/browse/BEAM-3157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16361319#comment-16361319 ]
Anton Kedin edited comment on BEAM-3157 at 2/12/18 7:30 PM: ------------------------------------------------------------ Yes, this should be open until we have the code generation piece wired up with user-friendly APIs. Specifics of the API would be defined by the schema-aware PCollections design that [~reuvenlax] is working on. I have put together a functional prototype of how RowType generation can be wired up to SQL: [https://github.com/apache/beam/pull/4649/commits/e12d54725ab092c260a7084f50012e4fe3d7e81b#diff-30ffc29d9ac0817e4e88622a81da8ec3R58] : * it is triggered by specifying the InferredSqlRowCoder.ofSerializable(PersonPojo.class)) on the input PCollection; * as an example it uses SerializableCoder for original PCollection elements but it can be any other coder; * InferredSqlRowCoder wraps the Row/RowType generation logic for the element type, and this logic is then invoked by the SQL's QueryTransform; I will be gathering feedback and waiting for further work on the schema-aware PCollections before submitting this was (Author: kedin): Yes, this should be open until we have the code generation piece wired up with user-friendly APIs. Specifics of the API would be defined by the schema-aware PCollections design that [~reuvenlax] is working on. I have put together a functional prototype of how RowType generation can be wired up to SQL: [https://github.com/apache/beam/pull/4649/commits/e12d54725ab092c260a7084f50012e4fe3d7e81b#diff-30ffc29d9ac0817e4e88622a81da8ec3R58] I will be gathering feedback and waiting for further work on the schema-aware PCollections before submitting this > BeamSql transform should support other PCollection types > -------------------------------------------------------- > > Key: BEAM-3157 > URL: https://issues.apache.org/jira/browse/BEAM-3157 > Project: Beam > Issue Type: Improvement > Components: dsl-sql > Reporter: Ismaël Mejía > Assignee: Anton Kedin > Priority: Major > Fix For: 2.4.0 > > Time Spent: 5h 20m > Remaining Estimate: 0h > > Currently the Beam SQL transform only supports input and output data > represented as a BeamRecord. This seems to me like an usability limitation > (even if we can do a ParDo to prepare objects before and after the transform). > I suppose this constraint comes from the fact that we need to map > name/type/value from an object field into Calcite so it is convenient to have > a specific data type (BeamRecord) for this. However we can accomplish the > same by using a PCollection of JavaBean (where we know the same information > via the field names/types/values) or by using Avro records where we also have > the Schema information. For the output PCollection we can map the object via > a Reference (e.g. a JavaBean to be filled with the names of an Avro object). > Note: I am assuming for the moment simple mappings since the SQL does not > support composite types for the moment. > A simple API idea would be something like this: > A simple filter: > PCollection<MyPojo> col = BeamSql.query("SELECT * FROM .... WHERE > ...").from(MyPojo.class); > A projection: > PCollection<MyNewPojo> newCol = BeamSql.query("SELECT id, > name").from(MyPojo.class).as(MyNewPojo.class); > A first approach could be to just add the extra ParDos + transform DoFns > however I suppose that for memory use reasons maybe mapping directly into > Calcite would make sense. -- This message was sent by Atlassian JIRA (v7.6.3#76005)