ahmedabu98 commented on code in PR #22452:
URL: https://github.com/apache/beam/pull/22452#discussion_r930360989


##########
sdks/python/apache_beam/io/gcp/bigquery_tools.py:
##########
@@ -101,6 +101,10 @@
 # Timeout for a BQ streaming insert RPC. Set to a maximum of 2 minutes.
 BQ_STREAMING_INSERT_TIMEOUT_SEC = 120
 
+_PROJECT_PATTERN = r'[a-z][a-z0-9-]{4,28}[a-z0-9]'

Review Comment:
   I suspect this is why precommits are failing: 
https://ci-beam.apache.org/job/beam_PreCommit_Python_Commit/24179/testReport/



##########
sdks/python/apache_beam/io/gcp/bigquery_tools.py:
##########
@@ -254,11 +258,11 @@ def parse_table_reference(table, dataset=None, 
project=None):
   # table argument will contain a full table reference instead of just a
   # table name.
   if dataset is None:
-    regex = re.compile(
-        r'''^((?P<project>.+):)?(?P<dataset>\w+)\.
-            (?P<table>[-\w\$]+(\s+\-*\w+)*)$''',
-        re.X)
-    match = regex.match(table)
+    pattern = (
+        f'(?P<project>{_PROJECT_PATTERN})'
+        f'[:\\.](?P<dataset>{_DATASET_PATTERN})'
+        f'[:\\.](?P<table>{_TABLE_PATTERN})')

Review Comment:
   Glad to see you're allowing for the dot in `project.dataset` too.
   
   Does BigQuery allow the colon operator in between `dataset` and `table`? 
AFAIK it should be dot operator.



##########
sdks/python/apache_beam/io/gcp/bigquery_tools.py:
##########
@@ -101,6 +101,10 @@
 # Timeout for a BQ streaming insert RPC. Set to a maximum of 2 minutes.
 BQ_STREAMING_INSERT_TIMEOUT_SEC = 120
 
+_PROJECT_PATTERN = r'[a-z][a-z0-9-]{4,28}[a-z0-9]'

Review Comment:
   Minimum project character requirement is 4, see 
[here](https://cloud.google.com/resource-manager/docs/creating-managing-projects#creating_a_project).
 As it is now the minimum is 6. Perhaps the lower end of the range could be 2.



##########
sdks/python/apache_beam/io/gcp/bigquery_tools.py:
##########
@@ -101,6 +101,10 @@
 # Timeout for a BQ streaming insert RPC. Set to a maximum of 2 minutes.
 BQ_STREAMING_INSERT_TIMEOUT_SEC = 120
 
+_PROJECT_PATTERN = r'[a-z][a-z0-9-]{4,28}[a-z0-9]'
+_DATASET_PATTERN = r'\w{1,1024}'
+_TABLE_PATTERN = r'[\p{L}\p{M}\p{N}\p{Pc}\p{Pd}\p{Zs}]{1,1024}'

Review Comment:
   Does this include the partition character `$`? This is needed when user is 
specifying a table partition, see 
https://cloud.google.com/bigquery/docs/managing-partitioned-table-data#partition_decorators.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to