Kamil Wasilewski created BEAM-9154:
--------------------------------------
Summary: Move Chicago Taxi Example to Python 3
Key: BEAM-9154
URL: https://issues.apache.org/jira/browse/BEAM-9154
Project: Beam
Issue Type: Improvement
Components: testing
Reporter: Kamil Wasilewski
Assignee: Kamil Wasilewski
The Chicago Taxi Example[1] should be moved to the latest version of Python
supported by Beam (currently it's Python 3.7). The benchmark should run both on
Dataflow and Flink.
At the moment, the following error occurs when running the benchmark (requires
futher investigation):
{code:java}
Traceback (most recent call last):
File "preprocess.py", line 259, in <module>
main()
File "preprocess.py", line 254, in main
project=known_args.metric_reporting_project
File "preprocess.py", line 155, in transform_data
('Analyze' >> tft_beam.AnalyzeDataset(preprocessing_fn)))
File
"/Users/kamilwasilewski/proj/beam/sdks/python/apache_beam/transforms/ptransform.py",
line 987, in __ror__
return self.transform.__ror__(pvalueish, self.label)
File
"/Users/kamilwasilewski/proj/beam/sdks/python/apache_beam/transforms/ptransform.py",
line 547, in __ror__
result = p.apply(self, pvalueish, label)
File "/Users/kamilwasilewski/proj/beam/sdks/python/apache_beam/pipeline.py",
line 532, in apply
return self.apply(transform, pvalueish)
File "/Users/kamilwasilewski/proj/beam/sdks/python/apache_beam/pipeline.py",
line 573, in apply
pvalueish_result = self.runner.apply(transform, pvalueish, self._options)
File
"/Users/kamilwasilewski/proj/beam/sdks/python/apache_beam/runners/runner.py",
line 193, in apply
return m(transform, input, options)
File
"/Users/kamilwasilewski/proj/beam/sdks/python/apache_beam/runners/runner.py",
line 223, in apply_PTransform
return transform.expand(input)
File
"/Users/kamilwasilewski/proj/beam/build/gradleenv/2022703441/lib/python3.7/site-packages/tensorflow_transform/beam/impl.py",
line 825, in expand
input_metadata))
File
"/Users/kamilwasilewski/proj/beam/build/gradleenv/2022703441/lib/python3.7/site-packages/tensorflow_transform/beam/impl.py",
line 716, in expand
output_signature = self._preprocessing_fn(copied_inputs)
File "preprocess.py", line 102, in preprocessing_fn
_fill_in_missing(inputs[key]),
KeyError: 'company'
{code}
[1] sdks/python/apache_beam/testing/benchmarks/chicago_taxi
--
This message was sent by Atlassian Jira
(v8.3.4#803005)