sean teeling created BEAM-12803:
-----------------------------------
Summary: SqlTransform doesn't work on python 3.9
Key: BEAM-12803
URL: https://issues.apache.org/jira/browse/BEAM-12803
Project: Beam
Issue Type: Bug
Components: sdk-py-core
Reporter: sean teeling
Working example below. You don't even need to have the csv file created.
```python
import itertools
import csv
import io
import apache_beam as beam
from apache_beam.dataframe.io import read_csv
from apache_beam.transforms.sql import SqlTransform
def parse_csv(val):
deflower_headers(iterator):
return itertools.chain([next(iterator).lower()], iterator)
return csv.DictReader(lower_headers(io.TextIOWrapper(val.open())))
class BeamTransformBuilder():
defbuild(self, pipeline):
practices = (
pipeline
| beam.io.fileio.MatchFiles("data.csv")
| beam.io.fileio.ReadMatches()
| beam.Reshuffle()
| beam.FlatMap(parse_csv)
| beam.Map(lambda x: beam.Row(id="test-id"))
| SqlTransform("""
SELECT
id
FROM PCOLLECTION""")
)
practices | beam.Map(print)
def main():
builder = BeamTransformBuilder()
with beam.Pipeline('DirectRunner') as p:
builder.build(p)
if __name__ == '__main__':
main()
```
Results in the error:
File
"/usr/local/lib/python3.9/site-packages/apache_beam/typehints/schemas.py", line
185, in typing_to_runner_api
element_type = typing_to_runner_api(_get_args(type_)[0])
IndexError: tuple index out of range
Tested on Python 3.9.6.
Annoyingly, it is difficult to test this out on other python versions. There's
no documentation for how to setup a docker container using DirectRunner and
running it locally. There's barely any documentation on what python versions
are supported. And using pyenv, and pip install apache-beam requires a lot of
other downloads that have conflicts when other versions are already installed.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)