[ 
https://issues.apache.org/jira/browse/BEAM-5628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16647969#comment-16647969
 ] 

Simon commented on BEAM-5628:
-----------------------------

This error can be traced back to the _create_generator function (io/vcfio.py: 
line 318), where it is mentioned that PyVCF has explicit str() calls when 
parsing INFO fields, which fails with UTF-8 decoded strings. For this reason, 
the line is encoded back to UTF-8 in the python2 version. 

Because removing the encoding step results in hanging of some tests, there is a 
chance this relates to 5623.

Does anyone have additional insights?

> Several VcfIO tests fail in Python 3 with  TypeError: cannot use a string 
> pattern on a bytes-like object
> --------------------------------------------------------------------------------------------------------
>
>                 Key: BEAM-5628
>                 URL: https://issues.apache.org/jira/browse/BEAM-5628
>             Project: Beam
>          Issue Type: Sub-task
>          Components: sdk-py-core
>            Reporter: Valentyn Tymofieiev
>            Assignee: Simon
>            Priority: Major
>
> ERROR: test_read_after_splitting (apache_beam.io.vcfio_test.VcfSourceTest)
> "
>  ----------------------------------------------------------------------
> Traceback (most recent call last):
>    File 
> ""/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/io/vcfio_test.py"",
>  line 336, in test_read_after_splitting
> ]     split_records.extend(source_test_utils.read_from_source(*source_info))
> ]   File 
> ""/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/io/source_test_utils.py"",
>  line 101, in read_from_source
>      for value in reader:
>    File 
> ""/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/io/vcfio.py"",
>  line 264, in read_records
>      for line in record_iterator:
>    File 
> ""/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/io/vcfio.py"",
>  line 330, in __next__
>      record = next(self._vcf_reader)
>    File 
> ""/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/target/.tox/py3/lib/python3.5/site-packages/vcf/parser.py"",
>  line 543, in __next__
>      row = self._row_pattern.split(line.rstrip())
>  TypeError: cannot use a string pattern on a bytes-like object
> "



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to