[
https://issues.apache.org/jira/browse/BEAM-6772?focusedWorklogId=212745&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-212745
]
ASF GitHub Bot logged work on BEAM-6772:
----------------------------------------
Author: ASF GitHub Bot
Created on: 13/Mar/19 22:42
Start Date: 13/Mar/19 22:42
Worklog Time Spent: 10m
Work Description: reuvenlax commented on issue #8006: [BEAM-6772] Change
Select semantics to match what a user expects
URL: https://github.com/apache/beam/pull/8006#issuecomment-472634925
Ok this makes sense then.
As I see it BigQuery (and Spark) is unnesting just like this PR doees. The
difference is that it then creates a new Row to contain the selected
result. So select("location") returns a Row containing a location.
select("location.latitude") returns a Row containing a double. I would
assume that select("location.*") would return a row containing a latitude
and a longitude.
We could validate this by nesting the location three layers deep. In that
case I suspect BQ would return the same thing it did in the above
experiment, not a three-layered nested row.
On Wed, Mar 13, 2019 at 2:39 PM Gleb Kanterov <[email protected]>
wrote:
> As I understand, the approach is to take the last element in the field
> path, as I see, BigQuery works the same way:
>
> CREATE TABLE `test.test_schema_2` AS
> SELECT location.latitude FROM test.test;
>
> $ bq show --schema test.test_schema_2 | jq
> [
> {
> "type": "FLOAT",
> "name": "latitude"
> }
> ]
>
> —
> You are receiving this because you authored the thread.
> Reply to this email directly, view it on GitHub
> <https://github.com/apache/beam/pull/8006#issuecomment-472616525>, or mute
> the thread
>
<https://github.com/notifications/unsubscribe-auth/AUGE1XIPD-j12E_DRmJcDaVECYPizaOcks5vWXAhgaJpZM4bh_oa>
> .
>
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 212745)
Time Spent: 4h 10m (was: 4h)
> Select transform has non-intuitive semantics
> --------------------------------------------
>
> Key: BEAM-6772
> URL: https://issues.apache.org/jira/browse/BEAM-6772
> Project: Beam
> Issue Type: Sub-task
> Components: sdk-java-core
> Reporter: Reuven Lax
> Assignee: Reuven Lax
> Priority: Major
> Time Spent: 4h 10m
> Remaining Estimate: 0h
>
> Consider the following schema:
> User:
> name: STRING
> location: Location
>
> Location:
> latitude: DOUBLE
> longitude: DOUBLE
>
> If you apply Select.fieldNames("location"), most users expect to get back a
> row matching the Location schema. Instead you get back an outer schema with a
> single location field in it. Select should instead unnest the output up to
> the point where multiple fields are selected.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)