[
https://issues.apache.org/jira/browse/BEAM-9804?focusedWorklogId=549233&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-549233
]
ASF GitHub Bot logged work on BEAM-9804:
----------------------------------------
Author: ASF GitHub Bot
Created on: 07/Feb/21 08:53
Start Date: 07/Feb/21 08:53
Worklog Time Spent: 10m
Work Description: otourzan commented on pull request #12960:
URL: https://github.com/apache/beam/pull/12960#issuecomment-774637281
I may not have understood this change correctly but I see 2 issues here.
1- _BigQuerySource is the private one and users don't call it. It supposed
to be called through BigQuerySource [1]. Those 2 classes don't use kwargs and
temp_data set is not defined in init parameters of public one as well so it's
not accessible to users, right?
2- ReadFromBigQuery and _CustomBigQuerySource handle kwargs, but I think the
temp_daatset needs to be documented in ReadFromBigQuery docs as the public
method users use.
[1]:
https://github.com/apache/beam/blob/b74fcf7b30d956fb42830d652a57b265a1546973/sdks/python/apache_beam/io/gcp/bigquery.py#L492
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 549233)
Time Spent: 3h 10m (was: 3h)
> beam.io.BigQuerySource needs permissions to create datasets to be able to run
> queries
> -------------------------------------------------------------------------------------
>
> Key: BEAM-9804
> URL: https://issues.apache.org/jira/browse/BEAM-9804
> Project: Beam
> Issue Type: Bug
> Components: io-py-gcp
> Reporter: Jonathan Sulman
> Priority: P3
> Fix For: 2.26.0
>
> Time Spent: 3h 10m
> Remaining Estimate: 0h
>
> Based on BEAM-8458, which was closed with a Java fix in 2.20.0. However, the
> bug still exists in the python SDK.
> When using BigQuerySource with the query option, BigQueryReader creates a
> temporary dataset to store the results of the query.
> Therefore, Beam requires permissions to create datasets just to be able to
> run a query. In practice, this means that Beam requires the role
> bigQuery.User just to run queries, whereas if you use {{from}} (to read from
> a table), the role bigQuery.jobUser suffices.
> BigqueryDataSource should have an option to set an existing dataset to write
> the temp results of
> a query, so it would be enough with having the role bigQuery.jobUser.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)