[jira] [Commented] (BEAM-3157) BeamSql transform should support other PCollection types

ASF GitHub Bot (JIRA) Sat, 02 Dec 2017 00:35:17 -0800

    [ 
https://issues.apache.org/jira/browse/BEAM-3157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16275479#comment-16275479
 ]


ASF GitHub Bot commented on BEAM-3157:
--------------------------------------

akedin opened a new pull request #4204: [BEAM-3157] Generate BeamRecord types 
from Pojos
URL: https://github.com/apache/beam/pull/4204
 
 
   This implements automatic generation of BeamRecordTypes and 
BeamRecordSqlTypes from pojo types. Work is being done as part of 
[BEAM-3157](https://issues.apache.org/jira/browse/BEAM-3157).
   
   Main piece is 
[RecordFactory](https://github.com/apache/beam/compare/master...akedin:generate-record-types?expand=1#diff-55e6442c81f404c1004a445b550f03c9)
 which exposes a method to generate BeamRecords from pojos. See 
[RecordFactoryTest](https://github.com/apache/beam/compare/master...akedin:generate-record-types?expand=1#diff-869b654afa6699d55098a8fc3f2e5740)
 for usage examples. 
   
   The plan is to integrate this into the Beam SQL framework. Integration into 
SQL will be done in the future PRs.
   
   Records generation is a major step to simplify conversion of pojo model to 
BeamRecords. Immediate use case is implementation of Nexmark queries in Beam 
SQL using existing pojo models. 
   This can also be used as a starting point for code generation for 
schema-aware collections.
   
   
   
   
   
   Follow this checklist to help us incorporate your contribution quickly and 
easily:
   
    - [ ] Make sure there is a [JIRA 
issue](https://issues.apache.org/jira/projects/BEAM/issues/) filed for the 
change (usually before you start working on it).  Trivial changes like typos do 
not require a JIRA issue.  Your pull request should address just this issue, 
without pulling in other changes.
    - [ ] Each commit in the pull request should have a meaningful subject line 
and body.
    - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue.
    - [ ] Write a pull request description that is detailed enough to 
understand what the pull request does, how, and why.
    - [ ] Run `mvn clean verify` to make sure basic checks pass. A more 
thorough check will be performed on your pull request automatically.
    - [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   ---
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


> BeamSql transform should support other PCollection types
> --------------------------------------------------------
>
>                 Key: BEAM-3157
>                 URL: https://issues.apache.org/jira/browse/BEAM-3157
>             Project: Beam
>          Issue Type: Improvement
>          Components: dsl-sql
>            Reporter: Ismaël Mejía
>            Assignee: Anton Kedin
>
> Currently the Beam SQL transform only supports input and output data 
> represented as a BeamRecord. This seems to me like an usability limitation 
> (even if we can do a ParDo to prepare objects before and after the transform).
> I suppose this constraint comes from the fact that we need to map 
> name/type/value from an object field into Calcite so it is convenient to have 
> a specific data type (BeamRecord) for this. However we can accomplish the 
> same by using a PCollection of JavaBean (where we know the same information 
> via the field names/types/values) or by using Avro records where we also have 
> the Schema information. For the output PCollection we can map the object via 
> a Reference (e.g. a JavaBean to be filled with the names of an Avro object).
> Note: I am assuming for the moment simple mappings since the SQL does not 
> support composite types for the moment.
> A simple API idea would be something like this:
> A simple filter:
> PCollection<MyPojo> col = BeamSql.query("SELECT * FROM .... WHERE 
> ...").from(MyPojo.class);
> A projection:
> PCollection<MyNewPojo> newCol = BeamSql.query("SELECT id, 
> name").from(MyPojo.class).as(MyNewPojo.class);
> A first approach could be to just add the extra ParDos + transform DoFns 
> however I suppose that for memory use reasons maybe mapping directly into 
> Calcite would make sense.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (BEAM-3157) BeamSql transform should support other PCollection types

Reply via email to