[ 
https://issues.apache.org/jira/browse/BEAM-6772?focusedWorklogId=212760&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-212760
 ]

ASF GitHub Bot logged work on BEAM-6772:
----------------------------------------

                Author: ASF GitHub Bot
            Created on: 13/Mar/19 23:17
            Start Date: 13/Mar/19 23:17
    Worklog Time Spent: 10m 
      Work Description: reuvenlax commented on issue #8006: [BEAM-6772] Change 
Select semantics to match what a user expects
URL: https://github.com/apache/beam/pull/8006#issuecomment-472643320
 
 
   Always adding one extra row around whatever is selected makes sense in
   Spark and BQ, because it makes selection consistent between primitive and
   nested types. Select must always return a row, so if you select userId, it
   returns a Row with a single string field (not a scalar string field). Spark
   and BQ have therefore decided to treat nested rows the exact same way: even
   though the nested row could be returned directly, it is treated
   consistently like primitive types and boxed inside a result row.
   
   On Wed, Mar 13, 2019 at 4:09 PM Gleb Kanterov <[email protected]>
   wrote:
   
   > Makes sense. I tried and both BigQuery and Spark have consistent behavior
   > for selecting rows nested in rows:
   >
   > df.printSchema()
   > root
   >  |-- userId: string (nullable = true)
   >  |-- position: struct (nullable = true)
   >  |    |-- location: struct (nullable = true)
   >  |    |    |-- longtitude: double (nullable = true)
   >  |    |    |-- latitude: double (nullable = true)
   >
   > df.select("position.location").printSchema()
   > root
   >  |-- location: struct (nullable = true)
   >  |    |-- longtitude: double (nullable = true)
   >  |    |-- latitude: double (nullable = true)
   >
   >
   > —
   > You are receiving this because you authored the thread.
   > Reply to this email directly, view it on GitHub
   > <https://github.com/apache/beam/pull/8006#issuecomment-472641498>, or mute
   > the thread
   > 
<https://github.com/notifications/unsubscribe-auth/AUGE1X7rouPUfs4n2cmi2PS6w6iTZj-Uks5vWYVCgaJpZM4bh_oa>
   > .
   >
   
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


Issue Time Tracking
-------------------

    Worklog Id:     (was: 212760)
    Time Spent: 4h 40m  (was: 4.5h)

> Select transform has non-intuitive semantics
> --------------------------------------------
>
>                 Key: BEAM-6772
>                 URL: https://issues.apache.org/jira/browse/BEAM-6772
>             Project: Beam
>          Issue Type: Sub-task
>          Components: sdk-java-core
>            Reporter: Reuven Lax
>            Assignee: Reuven Lax
>            Priority: Major
>          Time Spent: 4h 40m
>  Remaining Estimate: 0h
>
> Consider the following schema:
> User:
>     name: STRING
>     location: Location
>  
> Location:
>     latitude: DOUBLE
>     longitude: DOUBLE
>  
> If you apply Select.fieldNames("location"), most users expect to get back a 
> row matching the Location schema. Instead you get back an outer schema with a 
> single location field in it. Select should instead unnest the output up to 
> the point where multiple fields are selected.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to