[ https://issues.apache.org/jira/browse/BEAM-5628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16647969#comment-16647969 ]
Simon commented on BEAM-5628: ----------------------------- This error can be traced back to the _create_generator function (io/vcfio.py: line 318), where it is mentioned that PyVCF has explicit str() calls when parsing INFO fields, which fails with UTF-8 decoded strings. For this reason, the line is encoded back to UTF-8 in the python2 version. Because removing the encoding step results in hanging of some tests, there is a chance this relates to 5623. Does anyone have additional insights? > Several VcfIO tests fail in Python 3 with TypeError: cannot use a string > pattern on a bytes-like object > -------------------------------------------------------------------------------------------------------- > > Key: BEAM-5628 > URL: https://issues.apache.org/jira/browse/BEAM-5628 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core > Reporter: Valentyn Tymofieiev > Assignee: Simon > Priority: Major > > ERROR: test_read_after_splitting (apache_beam.io.vcfio_test.VcfSourceTest) > " > ---------------------------------------------------------------------- > Traceback (most recent call last): > File > ""/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/io/vcfio_test.py"", > line 336, in test_read_after_splitting > ] split_records.extend(source_test_utils.read_from_source(*source_info)) > ] File > ""/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/io/source_test_utils.py"", > line 101, in read_from_source > for value in reader: > File > ""/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/io/vcfio.py"", > line 264, in read_records > for line in record_iterator: > File > ""/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/io/vcfio.py"", > line 330, in __next__ > record = next(self._vcf_reader) > File > ""/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/target/.tox/py3/lib/python3.5/site-packages/vcf/parser.py"", > line 543, in __next__ > row = self._row_pattern.split(line.rstrip()) > TypeError: cannot use a string pattern on a bytes-like object > " -- This message was sent by Atlassian JIRA (v7.6.3#76005)