[
https://issues.apache.org/jira/browse/BEAM-8841?focusedWorklogId=396562&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-396562
]
ASF GitHub Bot logged work on BEAM-8841:
----------------------------------------
Author: ASF GitHub Bot
Created on: 03/Mar/20 02:12
Start Date: 03/Mar/20 02:12
Worklog Time Spent: 10m
Work Description: chunyang commented on issue #10979: [BEAM-8841] Support
writing data to BigQuery via Avro in Python SDK
URL: https://github.com/apache/beam/pull/10979#issuecomment-593729608
I am able to run the integration test
`apache_beam.io.gcp.bigquery_file_loads_test:BigQueryFileLoadsIT` but for some
reason if I use the same procedure to run tests from
`apache_beam.io.gcp.bigquery_write_it_test.py:BigQueryWriteIntegrationTests`, I
get the following error:
```
[chuck.yang ~/src/beam/sdks/python cyang/avro-bigqueryio+]
% ./scripts/run_integration_test.sh --test_opts
"--tests=apache_beam.io.gcp.bigquery_write_it_test.py:BigQueryWriteIntegrationTests.test_big_query_write_without_schema
--nocapture" --project ... --gcs_location gs://... --kms_key_name ""
--streaming false
>>> RUNNING integration tests with pipeline options:
--runner=TestDataflowRunner --project=... --staging_location=gs://...
--temp_location=gs://... --output=gs://...
--sdk_location=build/apache-beam.tar.gz
--requirements_file=postcommit_requirements.txt --num_workers=1 --sleep_secs=20
>>> test options:
--tests=apache_beam.io.gcp.bigquery_write_it_test.py:BigQueryWriteIntegrationTests.test_big_query_write_without_schema
--nocapture
/home/chuck.yang/src/beam/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/target/.tox-py37-gcp-pytest/py37-gcp-pytest/lib/python3.7/site-packages/setuptools/dist.py:476:
UserWarning: Normalizing '2.21.0.dev' to '2.21.0.dev0'
normalized_version,
running nosetests
running egg_info
INFO:gen_protos:Skipping proto regeneration: all files up to date
writing apache_beam.egg-info/PKG-INFO
writing dependency_links to apache_beam.egg-info/dependency_links.txt
writing entry points to apache_beam.egg-info/entry_points.txt
writing requirements to apache_beam.egg-info/requires.txt
writing top-level names to apache_beam.egg-info/top_level.txt
reading manifest file 'apache_beam.egg-info/SOURCES.txt'
reading manifest template 'MANIFEST.in'
warning: no files found matching 'README.md'
warning: no files found matching 'NOTICE'
warning: no files found matching 'LICENSE'
writing manifest file 'apache_beam.egg-info/SOURCES.txt'
Failure: ImportError (No module named 'apache_beam') ... ERROR
======================================================================
ERROR: Failure: ImportError (No module named 'apache_beam')
----------------------------------------------------------------------
Traceback (most recent call last):
File
"/home/chuck.yang/src/beam/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/target/.tox-py37-gcp-pytest/py37-gcp-pytest/lib/python3.7/site-packages/nose/failure.py",
line 39, in runTest
raise self.exc_val.with_traceback(self.tb)
File
"/home/chuck.yang/src/beam/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/target/.tox-py37-gcp-pytest/py37-gcp-pytest/lib/python3.7/site-packages/nose/loader.py",
line 418, in loadTestsFromName
addr.filename, addr.module)
File
"/home/chuck.yang/src/beam/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/target/.tox-py37-gcp-pytest/py37-gcp-pytest/lib/python3.7/site-packages/nose/importer.py",
line 47, in importFromPath
return self.importFromDir(dir_path, fqname)
File
"/home/chuck.yang/src/beam/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/target/.tox-py37-gcp-pytest/py37-gcp-pytest/lib/python3.7/site-packages/nose/importer.py",
line 79, in importFromDir
fh, filename, desc = find_module(part, path)
File "/usr/lib/python3.7/imp.py", line 296, in find_module
raise ImportError(_ERR_MSG.format(name), name=name)
ImportError: No module named 'apache_beam'
----------------------------------------------------------------------
XML: nosetests-.xml
----------------------------------------------------------------------
XML: /home/chuck.yang/src/beam/sdks/python/nosetests.xml
----------------------------------------------------------------------
Ran 1 test in 0.002s
FAILED (errors=1)
```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 396562)
Time Spent: 5h 10m (was: 5h)
> Add ability to perform BigQuery file loads using avro
> -----------------------------------------------------
>
> Key: BEAM-8841
> URL: https://issues.apache.org/jira/browse/BEAM-8841
> Project: Beam
> Issue Type: Improvement
> Components: io-py-gcp
> Reporter: Chun Yang
> Assignee: Chun Yang
> Priority: Minor
> Time Spent: 5h 10m
> Remaining Estimate: 0h
>
> Currently, JSON format is used for file loads into BigQuery in the Python
> SDK. JSON has some disadvantages including size of serialized data and
> inability to represent NaN and infinity float values.
> BigQuery supports loading files in avro format, which can overcome these
> disadvantages. The Java SDK already supports loading files using avro format
> (BEAM-2879) so it makes sense to support it in the Python SDK as well.
> The change will be somewhere aroundÂ
> [{{BigQueryBatchFileLoads}}|https://github.com/apache/beam/blob/3e7865ee6c6a56e51199515ec5b4b16de1ddd166/sdks/python/apache_beam/io/gcp/bigquery_file_loads.py#L554].
--
This message was sent by Atlassian Jira
(v8.3.4#803005)