[
https://issues.apache.org/jira/browse/BEAM-6772?focusedWorklogId=212666&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-212666
]
ASF GitHub Bot logged work on BEAM-6772:
----------------------------------------
Author: ASF GitHub Bot
Created on: 13/Mar/19 20:36
Start Date: 13/Mar/19 20:36
Worklog Time Spent: 10m
Work Description: reuvenlax commented on issue #8006: [BEAM-6772] Change
Select semantics to match what a user expects
URL: https://github.com/apache/beam/pull/8006#issuecomment-472594904
+Kenn for comment
This seems very surprising to me, to be honest. It's not what I would
expect. This behavior has some mathematical niceties (e.g. it makes select
distribute across a union of selectors), but seems like it's almost never
what a user actually wants.
In Beam, part of the goal of schemas is to interact seamlessly with user
types (such as Pojos), so users don't have to deal with Row objects (unless
they want to). The above example for ParDo is one (though the
implementation for that is not yet merged). Another examples is that users
might expect to be able to write:
PCollection<Location> locations = pc.apply(Select.fieldNames("location"))
.apply(Convert.to(Location.class));
And that will only work if the selected item matches the schema of
Location, not if it's a nested schema.
Maybe for now we should add an option to the Select transform so that the
user can specify which behavior they want?
On Wed, Mar 13, 2019 at 1:28 PM Gleb Kanterov <[email protected]>
wrote:
> This way it's more visible:
>
> $ bq show --schema test.test_schema | jq
> [
> {
> "fields": [
> {
> "type": "FLOAT",
> "name": "latitude"
> },
> {
> "type": "FLOAT",
> "name": "longtitude"
> }
> ],
> "type": "RECORD",
> "name": "location"
> }
> ]```
>
> —
> You are receiving this because you authored the thread.
> Reply to this email directly, view it on GitHub
> <https://github.com/apache/beam/pull/8006#issuecomment-472592272>, or mute
> the thread
>
<https://github.com/notifications/unsubscribe-auth/AUGE1YZ5-hg8j4BjFSrhQXI3ACpYTImyks5vWV9zgaJpZM4bh_oa>
> .
>
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 212666)
Time Spent: 2h 50m (was: 2h 40m)
> Select transform has non-intuitive semantics
> --------------------------------------------
>
> Key: BEAM-6772
> URL: https://issues.apache.org/jira/browse/BEAM-6772
> Project: Beam
> Issue Type: Sub-task
> Components: sdk-java-core
> Reporter: Reuven Lax
> Assignee: Reuven Lax
> Priority: Major
> Time Spent: 2h 50m
> Remaining Estimate: 0h
>
> Consider the following schema:
> User:
> name: STRING
> location: Location
>
> Location:
> latitude: DOUBLE
> longitude: DOUBLE
>
> If you apply Select.fieldNames("location"), most users expect to get back a
> row matching the Location schema. Instead you get back an outer schema with a
> single location field in it. Select should instead unnest the output up to
> the point where multiple fields are selected.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)