[
https://issues.apache.org/jira/browse/BEAM-13355?focusedWorklogId=693564&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-693564
]
ASF GitHub Bot logged work on BEAM-13355:
-----------------------------------------
Author: ASF GitHub Bot
Created on: 09/Dec/21 21:30
Start Date: 09/Dec/21 21:30
Worklog Time Spent: 10m
Work Description: chamikaramj commented on a change in pull request
#16186:
URL: https://github.com/apache/beam/pull/16186#discussion_r766168570
##########
File path: sdks/python/apache_beam/io/gcp/bigquery_file_loads_test.py
##########
@@ -459,6 +459,38 @@ def test_records_traverse_transform_with_mocks(self):
assert_that(jobs, equal_to([job_reference]), label='CheckJobs')
+ def test_load_job_id_used(self):
Review comment:
Can you also manually run a pipeline to confirm that this works end to
end ? (no need to add an integration test).
##########
File path: sdks/python/apache_beam/io/gcp/bigquery_file_loads.py
##########
@@ -527,8 +532,11 @@ def process(self, element, job_name_prefix=None,
unused_schema_mod_jobs=None):
if not self.bq_io_metadata:
self.bq_io_metadata = create_bigquery_io_metadata(self._step_name)
+
+ project_id = copy_to_reference.projectId \
Review comment:
Nit: For formatting, I think we usually prefer adding parentheses over \
##########
File path: sdks/python/apache_beam/io/gcp/bigquery_file_loads_test.py
##########
@@ -459,6 +459,38 @@ def test_records_traverse_transform_with_mocks(self):
assert_that(jobs, equal_to([job_reference]), label='CheckJobs')
+ def test_load_job_id_used(self):
+ job_reference = bigquery_api.JobReference()
+ job_reference.projectId = 'loadJobId'
+ job_reference.jobId = 'job_name1'
+
+ result_job = bigquery_api.Job()
+ result_job.jobReference = job_reference
+
+ mock_job = mock.Mock()
+ mock_job.status.state = 'DONE'
+ mock_job.status.errorResult = None
+ mock_job.jobReference = job_reference
+
+ bq_client = mock.Mock()
+ bq_client.jobs.Get.return_value = mock_job
+
+ bq_client.jobs.Insert.return_value = result_job
+
+ transform = bqfl.BigQueryBatchFileLoads(
+ 'project1:dataset1.table1',
+ custom_gcs_temp_location=self._new_tempdir(),
+ test_client=bq_client,
+ validate=False,
+ load_job_project_id='loadJobId')
+
+ with TestPipeline('DirectRunner') as p:
+ outputs = p | beam.Create(_ELEMENTS) | transform
+ jobs = outputs[bqfl.BigQueryBatchFileLoads.DESTINATION_JOBID_PAIRS] \
+ | "GetJobs" >> beam.Map(lambda x: x[1])
+
+ assert_that(jobs, equal_to([job_reference]), label='CheckJobProjectIds')
Review comment:
I think you need a second test to confirm that the copy job also picks
up the specified project.
##########
File path: sdks/python/apache_beam/io/gcp/bigquery_file_loads_test.py
##########
@@ -459,6 +459,38 @@ def test_records_traverse_transform_with_mocks(self):
assert_that(jobs, equal_to([job_reference]), label='CheckJobs')
+ def test_load_job_id_used(self):
+ job_reference = bigquery_api.JobReference()
+ job_reference.projectId = 'loadJobId'
+ job_reference.jobId = 'job_name1'
+
+ result_job = bigquery_api.Job()
+ result_job.jobReference = job_reference
+
+ mock_job = mock.Mock()
+ mock_job.status.state = 'DONE'
+ mock_job.status.errorResult = None
+ mock_job.jobReference = job_reference
+
+ bq_client = mock.Mock()
+ bq_client.jobs.Get.return_value = mock_job
+
+ bq_client.jobs.Insert.return_value = result_job
+
+ transform = bqfl.BigQueryBatchFileLoads(
+ 'project1:dataset1.table1',
+ custom_gcs_temp_location=self._new_tempdir(),
+ test_client=bq_client,
+ validate=False,
+ load_job_project_id='loadJobId')
Review comment:
s/loadJobId/loadJobProject
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 693564)
Time Spent: 1h (was: 50m)
> Add a load_job_project_id option to Python BQ sink
> --------------------------------------------------
>
> Key: BEAM-13355
> URL: https://issues.apache.org/jira/browse/BEAM-13355
> Project: Beam
> Issue Type: Improvement
> Components: io-py-gcp
> Reporter: Chamikara Madhusanka Jayalath
> Assignee: John Casey
> Priority: P2
> Time Spent: 1h
> Remaining Estimate: 0h
>
> This will be similar to following option for Java and will allow users to
> customize the project from where load jobs are started. Currently load jobs
> always use the destination table ID.
> [https://github.com/apache/beam/blob/8efb426e74707f53dbb676906da19039e9bbde55/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.java#L2302]
>
--
This message was sent by Atlassian Jira
(v8.20.1#820001)