[jira] [Work logged] (BEAM-1909) BigQuery read transform fails for DirectRunner when querying non-US regions

ASF GitHub Bot (JIRA) Wed, 29 Aug 2018 13:44:22 -0700


     [ 
https://issues.apache.org/jira/browse/BEAM-1909?focusedWorklogId=139422&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-139422
 ]


ASF GitHub Bot logged work on BEAM-1909:
----------------------------------------

                Author: ASF GitHub Bot
            Created on: 29/Aug/18 20:43
            Start Date: 29/Aug/18 20:43
    Worklog Time Spent: 10m 
      Work Description: udim commented on a change in pull request #5435: 
[BEAM-1909] Fix BigQuery read transform fails for DirectRunner when querying 
non-US regions
URL: https://github.com/apache/beam/pull/5435#discussion_r213817094
 
 

 ##########
 File path: sdks/python/apache_beam/io/gcp/bigquery.py
 ##########
 @@ -641,19 +642,76 @@ def __init__(self, source, test_bigquery_client=None, 
use_legacy_sql=True,
     else:
       self.query = self.source.query
 
+  def _parse_results(self, mg, project=False):
+    """
+    Extract matched groups from regex match.
+    If project is provided, retrienve 3 matched groups, else retrieve 2 groups.
+    :param mg: matched group
+    :param project: project passed in if not matched in regex
+    """
+    if project:
+      return project, mg.group(1), mg.group(2)
+    else:
+      try:
+        return mg.group(1), mg.group(2), mg.group(3)
+      except IndexError:
+        return (None for _ in range(3))  # No location, not a breaking change
+
+  def _parse_results(self, project_regex, not_project_regex):
+    """
+    Extract matched groups from query given regexes passed into method.
+    Given two regexes, return three items:
+      projectID, datasetID and tableID. If prejectID is not provided in the
+      query, try and get the projet from the Class. Else, return None.
+    :param project_regex: Regex to match the full name of the dataset.
+      i.e. project.dataset.table
+    :param not_project_regex: Regex to match table and dataset when project is
+      not provided.
+      ie. dataset.table
+    :return: project, dataset, table
+    """
+    pm = re.search(project_regex, self.source.query)
+    if pm:
+      return pm.group(1), pm.group(2), pm.group(3)
+    else:
+      npm = re.search(not_project_regex, self.source.query)
+      if npm:
+        if self.source.project:
+          return self.source.project, npm.group(1), npm.group(2)
+    return (None for _ in range(3))  # No matches
+
+  def _parse_query(self):
 
 Review comment:
   We don't want to parse BigQuery SQL in Beam, as having our own parser would 
take a lot of effort to maintain.
   Instead, the preferred way to discover tables in a query is to run it in 
dry-run mode and look at the list of referenced tables, as mentioned in the bug:
   
https://issues.apache.org/jira/browse/BEAM-1909?focusedCommentId=16020138&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16020138
   
   Dry run option:
   
https://github.com/apache/beam/blob/e278e57077d8ab6459a9833eccd2ef20dc2faeae/sdks/python/apache_beam/io/gcp/bigquery.py#L806
   
   REST API reference for referencedTables:
   
https://cloud.google.com/bigquery/docs/reference/rest/v2/jobs#statistics.query.referencedTables
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


Issue Time Tracking
-------------------

    Worklog Id:     (was: 139422)
    Time Spent: 2.5h  (was: 2h 20m)

> BigQuery read transform fails for DirectRunner when querying non-US regions
> ---------------------------------------------------------------------------
>
>                 Key: BEAM-1909
>                 URL: https://issues.apache.org/jira/browse/BEAM-1909
>             Project: Beam
>          Issue Type: Bug
>          Components: sdk-py-core
>            Reporter: Chamikara Jayalath
>            Priority: Major
>          Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> See: 
> http://stackoverflow.com/questions/42135002/google-dataflow-cannot-read-and-write-in-different-locations-python-sdk-v0-5-5/42144748?noredirect=1#comment73621983_42144748
> This should be fixed by creating the temp dataset and table in the correct 
> region.
> cc: [~sb2nov]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Work logged] (BEAM-1909) BigQuery read transform fails for DirectRunner when querying non-US regions

Reply via email to