[ 
https://issues.apache.org/jira/browse/BEAM-11587?focusedWorklogId=771014&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-771014
 ]

ASF GitHub Bot logged work on BEAM-11587:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 16/May/22 20:01
            Start Date: 16/May/22 20:01
    Worklog Time Spent: 10m 
      Work Description: TheNeuralBit commented on code in PR #17159:
URL: https://github.com/apache/beam/pull/17159#discussion_r874104866


##########
sdks/python/apache_beam/io/gcp/bigquery_read_it_test.py:
##########
@@ -178,6 +178,31 @@ def test_iobase_source(self):
               query=query, use_standard_sql=True, project=self.project))
       assert_that(result, equal_to(self.TABLE_DATA))
 
+  @pytest.mark.it_postcommit
+  def test_table_schema_retrieve(self):
+    the_table = 
beam.io.gcp.bigquery.bigquery_tools.BigQueryWrapper().get_table(
+        project_id="apache-beam-testing",
+        dataset_id="beam_bigquery_io_test",
+        table_id="dfsqltable_3c7d6fd5_16e0460dfd0")
+    table = the_table.schema
+    utype = beam.io.gcp.bigquery_schema_tools.produce_pcoll_with_schema(table)
+    args = self.args + ["--experiments=save_main_session"]

Review Comment:
   Glad this got the test passing!!
   
   This isn't ideal though, since it would mean users of this feature would 
need to make sure to always save main session or else their pipeline will fail. 
I'm little surprised this worked too - isn't the argument `--save_main_session` 
not `--experiments=save_main_session`?
   
   Regardless, we need to find a solution that will work without 
save_main_session set. It looks like the solution in the DataFrame schema code 
was to create a DoFn with a custom `__reduce__` implementation that avoids 
pickling the user type: 
https://github.com/apache/beam/blob/03c3c3657ea51a60e301a25eef70d006fe8cc0e2/sdks/python/apache_beam/dataframe/schemas.py#L254-L258
   





Issue Time Tracking
-------------------

    Worklog Id:     (was: 771014)
    Time Spent: 9h 20m  (was: 9h 10m)

> Support pd.read_gbq and DataFrame.to_gbq
> ----------------------------------------
>
>                 Key: BEAM-11587
>                 URL: https://issues.apache.org/jira/browse/BEAM-11587
>             Project: Beam
>          Issue Type: New Feature
>          Components: dsl-dataframe, io-py-gcp, sdk-py-core
>            Reporter: Brian Hulette
>            Assignee: Svetak Vihaan Sundhar
>            Priority: P3
>              Labels: dataframe-api
>          Time Spent: 9h 20m
>  Remaining Estimate: 0h
>
> We should support 
> [read_gbq|https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_gbq.html]
>  and 
> [to_gbq|https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.to_gbq.html]
>  in the DataFrame API when gcp extras are installed.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

Reply via email to