[
https://issues.apache.org/jira/browse/BEAM-12803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17416322#comment-17416322
]
Jonathan Hourany edited comment on BEAM-12803 at 9/16/21, 8:39 PM:
-------------------------------------------------------------------
A quick update – my tired eyes missed that the `IndexError` was being generated
by from accessing the return of `_get_args`. I found out why_ `_get_args` was
returning an empty tuple and fixed the problem there, then swapped out any
other mentions of `_field_types` to `annotations_`. Almost all tests are
passing. The failing tests I've looked at so far don't seem to be related to
this change, or at least it's not quite obvious but I'll keep diving into them.
was (Author: jonathan hourany):
A quick update – my tired eyes missed that the `IndexError` was being generated
by from accessing the return of `_get_args`. I found out why `_get_args` was
returning an empty tuple and fixed the problem there, then swapped out any
other mentions of `_field_types` to `__annotations__`. Almost all tests are
passing. The failing tests I've looked at so far don't seem to be related to
this change, or at least it's not quite obvious but I'll keep diving into them.
> SqlTransform doesn't work on python 3.9
> ---------------------------------------
>
> Key: BEAM-12803
> URL: https://issues.apache.org/jira/browse/BEAM-12803
> Project: Beam
> Issue Type: Bug
> Components: sdk-py-core
> Reporter: sean teeling
> Assignee: Brian Hulette
> Priority: P2
>
> Working example below -(Is there no way to paste pre-formatted code into
> jira?!)- (EDIT: I added the appropriate "code" block)
> {code:python}
> import itertools
> import csv
> import io
> import apache_beam as beam
> from apache_beam.dataframe.io import read_csv
> from apache_beam.transforms.sql import SqlTransform
> def parse_csv(val):
> deflower_headers(iterator):
> return itertools.chain([next(iterator).lower()], iterator)
> return csv.DictReader(lower_headers(io.TextIOWrapper(val.open())))
> class BeamTransformBuilder():
> def build(self, pipeline):
> practices = (
> pipeline
> | beam.io.fileio.MatchFiles("data.csv")
> | beam.io.fileio.ReadMatches()
> | beam.Reshuffle()
> | beam.FlatMap(parse_csv)
> | beam.Map(lambda x: beam.Row(id="test-id"))
> | SqlTransform("""
> SELECT
> id
> FROM PCOLLECTION""")
> )
> practices | beam.Map(print)
> def main():
> builder = BeamTransformBuilder()
> with beam.Pipeline('DirectRunner') as p:
> builder.build(p)
> if __name__ == '__main__':
> main()
> {code}
>
> Results in the error:
>
> {code:java}
> File
> "/usr/local/lib/python3.9/site-packages/apache_beam/typehints/schemas.py",
> line 185, in typing_to_runner_api
> element_type = typing_to_runner_api(_get_args(type_)[0])
> IndexError: tuple index out of range
> {code}
>
>
> Tested on Python 3.9.6.
>
> Annoyingly, it is difficult to test this out on other python versions.
> There's no documentation for how to setup a docker container using
> DirectRunner and running it locally. There's barely any documentation on what
> python versions are supported. And using pyenv, and pip install apache-beam
> requires a lot of other downloads that have conflicts when other versions are
> already installed.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)