[ 
https://issues.apache.org/jira/browse/BEAM-12803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sean teeling updated BEAM-12803:
--------------------------------
    Description: 
 
{code:java}
 {code}
Working example below (Is there no way to paste pre-formatted code into jira?!)

 

 

 
import itertools
import csv
import io

import apache_beam as beam
from apache_beam.dataframe.io import read_csv
from apache_beam.transforms.sql import SqlTransform


def parse_csv(val):
deflower_headers(iterator):
return itertools.chain([next(iterator).lower()], iterator)
return csv.DictReader(lower_headers(io.TextIOWrapper(val.open())))


class BeamTransformBuilder():

defbuild(self, pipeline):
practices = (
pipeline
| beam.io.fileio.MatchFiles("data.csv")
| beam.io.fileio.ReadMatches()
| beam.Reshuffle()
| beam.FlatMap(parse_csv)
| beam.Map(lambda x: beam.Row(id="test-id"))
| SqlTransform("""
SELECT
id
FROM PCOLLECTION""")
)
practices | beam.Map(print)


def main():
builder = BeamTransformBuilder()
with beam.Pipeline('DirectRunner') as p:
builder.build(p)


if __name__ == '__main__':
main()
 
Results in the error:

 

  File 
"/usr/local/lib/python3.9/site-packages/apache_beam/typehints/schemas.py", line 
185, in typing_to_runner_api

    element_type = typing_to_runner_api(_get_args(type_)[0])

IndexError: tuple index out of range

 

 

Tested on Python 3.9.6. 

 

Annoyingly, it is difficult to test this out on other python versions. There's 
no documentation for how to setup a docker container using DirectRunner and 
running it locally. There's barely any documentation on what python versions 
are supported. And using pyenv, and pip install apache-beam requires a lot of 
other downloads that have conflicts when other versions are already installed.

  was:
Working example below. You don't even need to have the csv file created.

 

```python
import itertools
import csv
import io

import apache_beam as beam
from apache_beam.dataframe.io import read_csv
from apache_beam.transforms.sql import SqlTransform


def parse_csv(val):
deflower_headers(iterator):
return itertools.chain([next(iterator).lower()], iterator)
return csv.DictReader(lower_headers(io.TextIOWrapper(val.open())))


class BeamTransformBuilder():

defbuild(self, pipeline):
practices = (
pipeline
| beam.io.fileio.MatchFiles("data.csv")
| beam.io.fileio.ReadMatches()
| beam.Reshuffle()
| beam.FlatMap(parse_csv)
| beam.Map(lambda x: beam.Row(id="test-id"))
| SqlTransform("""
SELECT
id
FROM PCOLLECTION""")
)
practices | beam.Map(print)


def main():
builder = BeamTransformBuilder()
with beam.Pipeline('DirectRunner') as p:
builder.build(p)


if __name__ == '__main__':
main()
```

 

Results in the error:

 

  File 
"/usr/local/lib/python3.9/site-packages/apache_beam/typehints/schemas.py", line 
185, in typing_to_runner_api

    element_type = typing_to_runner_api(_get_args(type_)[0])

IndexError: tuple index out of range

 

 

Tested on Python 3.9.6. 

 

Annoyingly, it is difficult to test this out on other python versions. There's 
no documentation for how to setup a docker container using DirectRunner and 
running it locally. There's barely any documentation on what python versions 
are supported. And using pyenv, and pip install apache-beam requires a lot of 
other downloads that have conflicts when other versions are already installed.


> SqlTransform doesn't work on python 3.9
> ---------------------------------------
>
>                 Key: BEAM-12803
>                 URL: https://issues.apache.org/jira/browse/BEAM-12803
>             Project: Beam
>          Issue Type: Bug
>          Components: sdk-py-core
>            Reporter: sean teeling
>            Priority: P1
>
>  
> {code:java}
>  {code}
> Working example below (Is there no way to paste pre-formatted code into 
> jira?!)
>  
>  
>  
> import itertools
> import csv
> import io
> import apache_beam as beam
> from apache_beam.dataframe.io import read_csv
> from apache_beam.transforms.sql import SqlTransform
> def parse_csv(val):
> deflower_headers(iterator):
> return itertools.chain([next(iterator).lower()], iterator)
> return csv.DictReader(lower_headers(io.TextIOWrapper(val.open())))
> class BeamTransformBuilder():
> defbuild(self, pipeline):
> practices = (
> pipeline
> | beam.io.fileio.MatchFiles("data.csv")
> | beam.io.fileio.ReadMatches()
> | beam.Reshuffle()
> | beam.FlatMap(parse_csv)
> | beam.Map(lambda x: beam.Row(id="test-id"))
> | SqlTransform("""
> SELECT
> id
> FROM PCOLLECTION""")
> )
> practices | beam.Map(print)
> def main():
> builder = BeamTransformBuilder()
> with beam.Pipeline('DirectRunner') as p:
> builder.build(p)
> if __name__ == '__main__':
> main()
>  
> Results in the error:
>  
>   File 
> "/usr/local/lib/python3.9/site-packages/apache_beam/typehints/schemas.py", 
> line 185, in typing_to_runner_api
>     element_type = typing_to_runner_api(_get_args(type_)[0])
> IndexError: tuple index out of range
>  
>  
> Tested on Python 3.9.6. 
>  
> Annoyingly, it is difficult to test this out on other python versions. 
> There's no documentation for how to setup a docker container using 
> DirectRunner and running it locally. There's barely any documentation on what 
> python versions are supported. And using pyenv, and pip install apache-beam 
> requires a lot of other downloads that have conflicts when other versions are 
> already installed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to