[
https://issues.apache.org/jira/browse/BEAM-11587?focusedWorklogId=765592&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-765592
]
ASF GitHub Bot logged work on BEAM-11587:
-----------------------------------------
Author: ASF GitHub Bot
Created on: 03/May/22 18:03
Start Date: 03/May/22 18:03
Worklog Time Spent: 10m
Work Description: svetakvsundhar commented on code in PR #17159:
URL: https://github.com/apache/beam/pull/17159#discussion_r864046636
##########
sdks/python/apache_beam/io/gcp/bigquery_schema_tools.py:
##########
@@ -0,0 +1,89 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements. See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+"""Tools used tool work with Schema types in the context of BigQuery.
+Classes, constants and functions in this file are experimental and have no
+backwards compatibility guarantees.
+NOTHING IN THIS FILE HAS BACKWARDS COMPATIBILITY GUARANTEES.
+"""
+
+from typing import Optional
+from typing import Sequence
+
+import numpy as np
+
+import apache_beam as beam
+from apache_beam.io.gcp.internal.clients import bigquery
+
+# BigQuery types as listed in
+# https://cloud.google.com/bigquery/docs/reference/standard-sql/data-types
+# with aliases (RECORD, BOOLEAN, FLOAT, INTEGER) as defined in
+#
https://developers.google.com/resources/api-libraries/documentation/bigquery/v2/java/latest/com/google/api/services/bigquery/model/TableFieldSchema.html#setType-java.lang.String-
+BIG_QUERY_TO_PYTHON_TYPES = {
+ "STRING": str,
+ "INTEGER": np.int64,
+ "FLOAT64": np.float64,
+ "BOOLEAN": bool,
+ "BYTES": bytes,
+ "TIMESTAMP": beam.utils.timestamp.Timestamp,
+ #TODO svetaksundhar@: Finish mappings for all BQ types
+}
+
+
+def produce_pcoll_with_schema(the_table_schema):
+ #type: (bigquery.TableSchema) -> type
+
+ """Convert a schema of type TableSchema into a pcollection element.
+ Args:
+ the_table_schema: A BQ schema of type TableSchema
+ Returns:
+ type: type that can be used to work with pCollections.
+ """
+
+ the_schema = beam.io.gcp.bigquery_tools.get_dict_table_schema(
+ the_table_schema)
+ if the_schema == {}:
+ raise ValueError("The schema is empty")
+ dict_of_tuples = []
+ for i in range(len(the_schema['fields'])):
+ if the_schema['fields'][i]['type'] in BIG_QUERY_TO_PYTHON_TYPES:
+ typ = bq_field_to_type(
+ the_schema['fields'][i]['type'], the_schema['fields'][i]['mode'])
+ else:
+ raise ValueError(the_schema['fields'][i]['type'])
+ # TODO svetaksundhar@: Map remaining BQ types
+ dict_of_tuples.append((the_schema['fields'][i]['name'], typ))
+ sample_schema = beam.typehints.schemas.named_fields_to_schema(dict_of_tuples)
+ usertype = beam.typehints.schemas.named_tuple_from_schema(sample_schema)
Review Comment:
hmmm looks like the issue is as described here:
https://issues.apache.org/jira/browse/BEAM-9574?page=com.atlassian.jira.plugin.system.issuetabpanels%3Aall-tabpanel
Issue Time Tracking
-------------------
Worklog Id: (was: 765592)
Time Spent: 5h 40m (was: 5.5h)
> Support pd.read_gbq and DataFrame.to_gbq
> ----------------------------------------
>
> Key: BEAM-11587
> URL: https://issues.apache.org/jira/browse/BEAM-11587
> Project: Beam
> Issue Type: New Feature
> Components: dsl-dataframe, io-py-gcp, sdk-py-core
> Reporter: Brian Hulette
> Assignee: Svetak Vihaan Sundhar
> Priority: P3
> Labels: dataframe-api
> Time Spent: 5h 40m
> Remaining Estimate: 0h
>
> We should support
> [read_gbq|https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_gbq.html]
> andÂ
> [to_gbq|https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.to_gbq.html]
> in the DataFrame API when gcp extras are installed.
--
This message was sent by Atlassian Jira
(v8.20.7#820007)