[ https://issues.apache.org/jira/browse/BEAM-14235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Anonymous updated BEAM-14235: ----------------------------- Status: Triage Needed (was: Resolved) > parquetio module does not parse PEP-440 compliant Pyarrow version > ----------------------------------------------------------------- > > Key: BEAM-14235 > URL: https://issues.apache.org/jira/browse/BEAM-14235 > Project: Beam > Issue Type: Bug > Components: io-py-parquet > Affects Versions: 2.27.0 > Reporter: Arwin S Tio > Priority: P3 > Fix For: 2.39.0 > > Time Spent: 1h 50m > Remaining Estimate: 0h > > In version > 2.27, introduced by this PR: > [https://github.com/apache/beam/pull/13302/files#diff-33b0b6b112036df96f341aa83b88efba9215ec14dfabc9db9e9ffe66a23154a2R55] > The parquetio module parses the pyarrow version like this: > {code:java} > ARROW_MAJOR_VERSION, _, _ = map(int, pa.__version__.split('.')) {code} > (see > [https://github.com/apache/beam/blob/v2.27.0/sdks/python/apache_beam/io/parquetio.py#L55)] > > This does not support all PEP-440 compliant versions: > [https://peps.python.org/pep-0440/] > > For example, if pyarrow were to have a version like this: *1.0.0+abc.7,* then > this module would fail: > {code:java} > Traceback (most recent call last): > File "/usr/local/lib/python3.7/runpy.py", line 183, in _run_module_as_main > mod_name, mod_spec, code = _get_module_details(mod_name, _Error) > File "/usr/local/lib/python3.7/runpy.py", line 109, in _get_module_details > __import__(pkg_name) > File "/usr/local/lib/python3.7/site-packages/apache_beam/__init__.py", line > 93, in <module> > from apache_beam import io > File "/usr/local/lib/python3.7/site-packages/apache_beam/io/__init__.py", > line 28, in <module> > from apache_beam.io.parquetio import * > File "/usr/local/lib/python3.7/site-packages/apache_beam/io/parquetio.py", > line 53, in <module> > ARROW_MAJOR_VERSION, _, _ = map(int, pa.__version__.split('.')) > ValueError: invalid literal for int() with base 10: '0+abc.7'{code} > > In practice, this would fail when somebody forks pyarrow, like yours truly. > > We can fix this by using *pkg_resourses.parse_version* which is PEP-440 > compliant starting setuptools 6.0. > > If maintainers agree with this change I would be wiling to submit a PR. > -- This message was sent by Atlassian Jira (v8.20.10#820010)