Kamil Wasilewski created BEAM-9154:
--------------------------------------

             Summary: Move Chicago Taxi Example to Python 3
                 Key: BEAM-9154
                 URL: https://issues.apache.org/jira/browse/BEAM-9154
             Project: Beam
          Issue Type: Improvement
          Components: testing
            Reporter: Kamil Wasilewski
            Assignee: Kamil Wasilewski


The Chicago Taxi Example[1] should be moved to the latest version of Python 
supported by Beam (currently it's Python 3.7). The benchmark should run both on 
Dataflow and Flink.

At the moment, the following error occurs when running the benchmark (requires 
futher investigation):
{code:java}
Traceback (most recent call last):
  File "preprocess.py", line 259, in <module>
    main()
  File "preprocess.py", line 254, in main
    project=known_args.metric_reporting_project
  File "preprocess.py", line 155, in transform_data
    ('Analyze' >> tft_beam.AnalyzeDataset(preprocessing_fn)))
  File 
"/Users/kamilwasilewski/proj/beam/sdks/python/apache_beam/transforms/ptransform.py",
 line 987, in __ror__
    return self.transform.__ror__(pvalueish, self.label)
  File 
"/Users/kamilwasilewski/proj/beam/sdks/python/apache_beam/transforms/ptransform.py",
 line 547, in __ror__
    result = p.apply(self, pvalueish, label)
  File "/Users/kamilwasilewski/proj/beam/sdks/python/apache_beam/pipeline.py", 
line 532, in apply
    return self.apply(transform, pvalueish)
  File "/Users/kamilwasilewski/proj/beam/sdks/python/apache_beam/pipeline.py", 
line 573, in apply
    pvalueish_result = self.runner.apply(transform, pvalueish, self._options)
  File 
"/Users/kamilwasilewski/proj/beam/sdks/python/apache_beam/runners/runner.py", 
line 193, in apply
    return m(transform, input, options)
  File 
"/Users/kamilwasilewski/proj/beam/sdks/python/apache_beam/runners/runner.py", 
line 223, in apply_PTransform
    return transform.expand(input)
  File 
"/Users/kamilwasilewski/proj/beam/build/gradleenv/2022703441/lib/python3.7/site-packages/tensorflow_transform/beam/impl.py",
 line 825, in expand
    input_metadata))
  File 
"/Users/kamilwasilewski/proj/beam/build/gradleenv/2022703441/lib/python3.7/site-packages/tensorflow_transform/beam/impl.py",
 line 716, in expand
    output_signature = self._preprocessing_fn(copied_inputs)
  File "preprocess.py", line 102, in preprocessing_fn
    _fill_in_missing(inputs[key]),
KeyError: 'company'
{code}
[1] sdks/python/apache_beam/testing/benchmarks/chicago_taxi



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to