[
https://issues.apache.org/jira/browse/BEAM-10647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Kamil Gałuszka updated BEAM-10647:
----------------------------------
Description:
This bug is not deterministic because of Google BigQuery API, but let me try to
describe the problem, as we were hunting down this for whole 2 days.
So imagine that you have one dataset with table XYZ. You added to that dataset
Authorized View that is referencing table in project that you don't have access
to. Only via Authorized View you can query that table.
Unfortunately when executing method
{code:java}
`get_query_location`{code}
To determine location where to write temp_dataset:
{code:java}
referenced_tables = response.statistics.query.referencedTables
if referenced_tables: # Guards against both non-empty and non-None
table = referenced_tables[0]
location = self.get_table_location( table.projectId, table.datasetId,
table.tableId)
{code}
The issue with that code is that, referenced_tables, will not reference where
view is but it will give you information about underlying table in that
authorised view.
So if it would be first in your result (and implementation of
get_query_location only cares about first result), you will get permission
error, that you cannot retrieve dataset which is correct! User has access to
Authorised view, that he can query, but not to underlying table.
Therefore, what should happen, implementation should be changed, to loops
through tables until it finds location.
Mainly my point boils down to:
* You can get table, that you don't have access and it's dataset, but you can
query it via Authorised Views.
was:
This bug is not deterministic because of Google BigQuery API, but let me try to
describe the problem, as we were hunting down this for whole 2 days.
So imagine that you have one dataset with table XYZ. You added to that dataset
Authorized View that is referencing table in project that you don't have access
to. Only via Authorized View you can query that table.
Unfortunately when executing method
{code:java}
`get_query_location`{code}
To determine location where to write temp_dataset:
{code:java}
referenced_tables = response.statistics.query.referencedTables
if referenced_tables: # Guards against both non-empty and non-None
table = referenced_tables[0]
location = self.get_table_location( table.projectId, table.datasetId,
table.tableId)
{code}
The issue with that code is that, referenced_tables, will not reference where
view is but it will give you information about underlying table in that
authorised view.
So if it would be first in your result (and implementation of
get_query_location only cares about first result), you will get permission
error, that you cannot retrieve dataset which is correct! User has access to
Authorised view, that he can query, but not to underlying table.
Therefore, what should happen, implementation should be changed, to loops
through tables until it finds location.
Mainly my point boils down to:
* You can get table, that you don't have access and it's dataset, but you can
query them via Authorised Views.
> BigQueryIO BigQueryWrapper.get_query_location can end up in permission issue
> ----------------------------------------------------------------------------
>
> Key: BEAM-10647
> URL: https://issues.apache.org/jira/browse/BEAM-10647
> Project: Beam
> Issue Type: Bug
> Components: io-py-gcp
> Reporter: Kamil Gałuszka
> Priority: P2
>
> This bug is not deterministic because of Google BigQuery API, but let me try
> to describe the problem, as we were hunting down this for whole 2 days.
> So imagine that you have one dataset with table XYZ. You added to that
> dataset Authorized View that is referencing table in project that you don't
> have access to. Only via Authorized View you can query that table.
> Unfortunately when executing method
> {code:java}
> `get_query_location`{code}
> To determine location where to write temp_dataset:
> {code:java}
> referenced_tables = response.statistics.query.referencedTables
> if referenced_tables: # Guards against both non-empty and non-None
> table = referenced_tables[0]
> location = self.get_table_location( table.projectId, table.datasetId,
> table.tableId)
> {code}
> The issue with that code is that, referenced_tables, will not reference
> where view is but it will give you information about underlying table in that
> authorised view.
> So if it would be first in your result (and implementation of
> get_query_location only cares about first result), you will get permission
> error, that you cannot retrieve dataset which is correct! User has access to
> Authorised view, that he can query, but not to underlying table.
> Therefore, what should happen, implementation should be changed, to loops
> through tables until it finds location.
> Mainly my point boils down to:
> * You can get table, that you don't have access and it's dataset, but you
> can query it via Authorised Views.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)