[
https://issues.apache.org/jira/browse/BEAM-11587?focusedWorklogId=765589&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-765589
]
ASF GitHub Bot logged work on BEAM-11587:
-----------------------------------------
Author: ASF GitHub Bot
Created on: 03/May/22 18:00
Start Date: 03/May/22 18:00
Worklog Time Spent: 10m
Work Description: pabloem commented on code in PR #17159:
URL: https://github.com/apache/beam/pull/17159#discussion_r864044533
##########
sdks/python/apache_beam/io/gcp/bigquery_test.py:
##########
@@ -482,6 +484,34 @@ def test_temp_dataset_is_configurable(
delete_table.assert_called_with(
temp_dataset.projectId, temp_dataset.datasetId, mock.ANY)
+ @pytest.mark.it_postcommit
+ def test_table_schema_retrieve(self):
+ the_table = apache_beam.io.gcp.bigquery.bigquery_tools.BigQueryWrapper(
+ ).get_table(
+ project_id="apache-beam-testing",
+ dataset_id="beam_bigquery_io_test",
+ table_id="dfsqltable_3c7d6fd5_16e0460dfd0")
+ table = the_table.schema
+ utype = bigquery_schema_tools.produce_pcoll_with_schema(table)
+ with beam.Pipeline() as p:
+ result = (
+ p | apache_beam.io.gcp.bigquery.ReadFromBigQuery(
+ gcs_location="gs://bqio_schema",
+ table="beam_bigquery_io_test.dfsqltable_3c7d6fd5_16e0460dfd0",
+ project="apache-beam-testing")
+ | apache_beam.io.gcp.bigquery.ReadFromBigQuery.get_pcoll_from_schema(
+ table))
+ assert_that(
+ result,
+ equal_to([
+ utype(id=3, name='customer1', type='test'),
+ utype(id=1, name='customer1', type='test'),
+ utype(id=2, name='customer2', type='test'),
+ utype(id=4, name='customer2', type='test')
+ ]))
Review Comment:
This suggestion will not work out-of-the-box, but since you're trying to
verify the data and the schema, you could do something like this:
```suggestion
table))
assert_that(
result | beam.Map(lambda x: {id: x.id, name:x.name, type: x.type}),
equal_to([
{id:3, name:'customer1', type:'test'},
{id:1, name:'customer1', type:'test'},
{id:2, name:'customer2', type:'test'},
{id:4, name:'customer2', type:'test'}
]))
self.assertEqual(result.schema, {'id': 'INT64', 'name': 'STRING',
'type': 'STRING'})
```
Issue Time Tracking
-------------------
Worklog Id: (was: 765589)
Time Spent: 5h 20m (was: 5h 10m)
> Support pd.read_gbq and DataFrame.to_gbq
> ----------------------------------------
>
> Key: BEAM-11587
> URL: https://issues.apache.org/jira/browse/BEAM-11587
> Project: Beam
> Issue Type: New Feature
> Components: dsl-dataframe, io-py-gcp, sdk-py-core
> Reporter: Brian Hulette
> Assignee: Svetak Vihaan Sundhar
> Priority: P3
> Labels: dataframe-api
> Time Spent: 5h 20m
> Remaining Estimate: 0h
>
> We should support
> [read_gbq|https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_gbq.html]
> andÂ
> [to_gbq|https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.to_gbq.html]
> in the DataFrame API when gcp extras are installed.
--
This message was sent by Atlassian Jira
(v8.20.7#820007)