[ 
https://issues.apache.org/jira/browse/BEAM-6772?focusedWorklogId=212666&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-212666
 ]

ASF GitHub Bot logged work on BEAM-6772:
----------------------------------------

                Author: ASF GitHub Bot
            Created on: 13/Mar/19 20:36
            Start Date: 13/Mar/19 20:36
    Worklog Time Spent: 10m 
      Work Description: reuvenlax commented on issue #8006: [BEAM-6772] Change 
Select semantics to match what a user expects
URL: https://github.com/apache/beam/pull/8006#issuecomment-472594904
 
 
   +Kenn for comment
   
   This seems very surprising to me, to be honest. It's not what I would
   expect. This behavior has some mathematical niceties (e.g. it makes select
   distribute across a union of selectors), but seems like it's almost never
   what a user actually wants.
   
   In Beam, part of the goal of schemas is to interact seamlessly with user
   types (such as Pojos), so users don't have to deal with Row objects (unless
   they want to). The above example for ParDo is one (though the
   implementation for that is not yet merged). Another examples is that users
   might expect to be able to write:
   
   PCollection<Location> locations = pc.apply(Select.fieldNames("location"))
   
    .apply(Convert.to(Location.class));
   
   And that will only work if the selected item matches the schema of
   Location, not if it's a nested schema.
   
   Maybe for now we should add an option to the Select transform so that the
   user can specify which behavior they want?
   
   On Wed, Mar 13, 2019 at 1:28 PM Gleb Kanterov <[email protected]>
   wrote:
   
   > This way it's more visible:
   >
   > $ bq show --schema test.test_schema | jq
   > [
   >   {
   >     "fields": [
   >       {
   >         "type": "FLOAT",
   >         "name": "latitude"
   >       },
   >       {
   >         "type": "FLOAT",
   >         "name": "longtitude"
   >       }
   >     ],
   >     "type": "RECORD",
   >     "name": "location"
   >   }
   > ]```
   >
   > —
   > You are receiving this because you authored the thread.
   > Reply to this email directly, view it on GitHub
   > <https://github.com/apache/beam/pull/8006#issuecomment-472592272>, or mute
   > the thread
   > 
<https://github.com/notifications/unsubscribe-auth/AUGE1YZ5-hg8j4BjFSrhQXI3ACpYTImyks5vWV9zgaJpZM4bh_oa>
   > .
   >
   
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


Issue Time Tracking
-------------------

    Worklog Id:     (was: 212666)
    Time Spent: 2h 50m  (was: 2h 40m)

> Select transform has non-intuitive semantics
> --------------------------------------------
>
>                 Key: BEAM-6772
>                 URL: https://issues.apache.org/jira/browse/BEAM-6772
>             Project: Beam
>          Issue Type: Sub-task
>          Components: sdk-java-core
>            Reporter: Reuven Lax
>            Assignee: Reuven Lax
>            Priority: Major
>          Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> Consider the following schema:
> User:
>     name: STRING
>     location: Location
>  
> Location:
>     latitude: DOUBLE
>     longitude: DOUBLE
>  
> If you apply Select.fieldNames("location"), most users expect to get back a 
> row matching the Location schema. Instead you get back an outer schema with a 
> single location field in it. Select should instead unnest the output up to 
> the point where multiple fields are selected.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to