[jira] [Work logged] (BEAM-10917) Implement a BigQuery bounded source using the BigQuery storage API

ASF GitHub Bot (Jira) Thu, 05 Aug 2021 15:38:06 -0700


     [ 
https://issues.apache.org/jira/browse/BEAM-10917?focusedWorklogId=634842&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-634842
 ]


ASF GitHub Bot logged work on BEAM-10917:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 05/Aug/21 22:37
            Start Date: 05/Aug/21 22:37
    Worklog Time Spent: 10m 
      Work Description: satybald commented on a change in pull request #15185:
URL: https://github.com/apache/beam/pull/15185#discussion_r683829720



##########
File path: sdks/python/apache_beam/io/gcp/bigquery.py
##########
@@ -883,6 +892,272 @@ def _export_files(self, bq):
     return table.schema, metadata_list
 
 
+class _CustomBigQueryStorageSourceBase(BoundedSource):
+  """A base class for BoundedSource implementations which read from BigQuery
+  using the BigQuery Storage API.
+
+  Args:
+    table (str, TableReference): The ID of the table. The ID must contain only
+      letters ``a-z``, ``A-Z``, numbers ``0-9``, or underscores ``_``  If
+      **dataset** argument is :data:`None` then the table argument must
+      contain the entire table reference specified as:
+      ``'PROJECT:DATASET.TABLE'`` or must specify a TableReference.
+    dataset (str): The ID of the dataset containing this table or
+      :data:`None` if the table argument specifies a TableReference.
+    project (str): The ID of the project containing this table or
+      :data:`None` if the table argument specifies a TableReference.
+    selected_fields (List[str]): Names of the fields in the table that should 
be
+      read. If empty, all fields will be read. If the specified field is a
+      nested field, all the sub-fields in the field will be selected. The 
output
+      field order is unrelated to the order of fields in selected_fields.
+    row_restriction (str): SQL text filtering statement, similar to a WHERE
+      clause in a query. Aggregates are not supported.Restricted to a maximum
+      length for 1 MB.
+  """
+
+  # The maximum number of streams which will be requested when creating a read
+  # session, regardless of the desired bundle size.
+  MAX_SPLIT_COUNT = 10000
+  # The minimum number of streams which will be requested when creating a read
+  # session, regardless of the desired bundle size. Note that the server may
+  # still choose to return fewer than ten streams based on the layout of the
+  # table.
+  MIN_SPLIT_COUNT = 10
+
+  def __init__(
+      self,
+      table: Union[str, TableReference],
+      dataset: str = None,

Review comment:
       If arguments can be `None`, should they be `Optional`? 
   ```suggestion
         dataset: Optional[str] = None,
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Issue Time Tracking
-------------------

    Worklog Id:     (was: 634842)
    Time Spent: 13h 40m  (was: 13.5h)

> Implement a BigQuery bounded source using the BigQuery storage API
> ------------------------------------------------------------------
>
>                 Key: BEAM-10917
>                 URL: https://issues.apache.org/jira/browse/BEAM-10917
>             Project: Beam
>          Issue Type: New Feature
>          Components: io-py-gcp
>            Reporter: Kenneth Jung
>            Assignee: Kanthi Subramanian
>            Priority: P3
>          Time Spent: 13h 40m
>  Remaining Estimate: 0h
>
> The Java SDK contains a bounded source implementation which uses the BigQuery 
> storage API to read from BigQuery. We should implement the same for Python.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-10917) Implement a BigQuery bounded source using the BigQuery storage API

Reply via email to