[ 
https://issues.apache.org/jira/browse/BEAM-6772?focusedWorklogId=217111&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-217111
 ]

ASF GitHub Bot logged work on BEAM-6772:
----------------------------------------

                Author: ASF GitHub Bot
            Created on: 22/Mar/19 03:23
            Start Date: 22/Mar/19 03:23
    Worklog Time Spent: 10m 
      Work Description: reuvenlax commented on issue #8006: [BEAM-6772] Change 
Select semantics to match what a user expects
URL: https://github.com/apache/beam/pull/8006#issuecomment-475481029
 
 
   @kennknowles The Spark behavior might be better because it preserves the 
resulting structure if more field are added. I mean the following. Consider 
this schema:
   
   S1:
      a: ARRAY[ROW(S2)]
   S2: 
     b: INT32
     c: ARRAY[ROW(S3)]
   S3:
     d: INT32
   
   now if you Select.fieldNames("a.c.d"), with my current PR you will get back 
{ ARRAY[ARRAY[{d: INT32}]] }. This itself is fine, but consider what happens  
if you Select.fieldNames("a.c.d", "a.b") with this PR. You will suddenly get 
back { ARRAY[{ b: INT32, c: ARRAY[{ d: INT32} ]] }. A new row suddenly got 
inserted because we needed an extra field. If we adopt the Spark approach, then 
the structure always stays the same, because the row is already there.
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


Issue Time Tracking
-------------------

    Worklog Id:     (was: 217111)
    Time Spent: 6.5h  (was: 6h 20m)

> Select transform has non-intuitive semantics
> --------------------------------------------
>
>                 Key: BEAM-6772
>                 URL: https://issues.apache.org/jira/browse/BEAM-6772
>             Project: Beam
>          Issue Type: Sub-task
>          Components: sdk-java-core
>            Reporter: Reuven Lax
>            Assignee: Reuven Lax
>            Priority: Major
>          Time Spent: 6.5h
>  Remaining Estimate: 0h
>
> Consider the following schema:
> User:
>     name: STRING
>     location: Location
>  
> Location:
>     latitude: DOUBLE
>     longitude: DOUBLE
>  
> If you apply Select.fieldNames("location"), most users expect to get back a 
> row matching the Location schema. Instead you get back an outer schema with a 
> single location field in it. Select should instead unnest the output up to 
> the point where multiple fields are selected.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to