[
https://issues.apache.org/jira/browse/BEAM-5628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16650355#comment-16650355
]
Asha Rostamianfar commented on BEAM-5628:
-----------------------------------------
Would it be ok if we just delete vcfio.py? I don't think anyone is actually
using it, and we haven't provided any documentation about how to use it anyway.
I suppose if anyone is using it, they can pin to an older version of Beam until
we release an updated version or use our implementation inside [Variant
Transforms|https://github.com/googlegenomics/gcp-variant-transforms/blob/master/gcp_variant_transforms/beam_io/vcfio.py].
Context: our original goal was to move [vcfio.py from Variant
Transforms|https://github.com/googlegenomics/gcp-variant-transforms/blob/master/gcp_variant_transforms/beam_io/vcfio.py]
to the Beam SDK so that the wider community can use it as well (we'd delete
that code on our end). This is still our goal, but we are planning to make
significant changes to vcfio (including switching the parser from PyVCF to
[Nucleus|https://github.com/google/nucleus] as it's a more supported parser
recently developed by Google Brain). Given the new issue, it may be easier to
just delete this transform and add it back once our transition to Nucleus has
been completed.
I can send a PR to delete the transform and its PyVCF dependency.
> Several VcfIO tests fail in Python 3 with TypeError: cannot use a string
> pattern on a bytes-like object
> --------------------------------------------------------------------------------------------------------
>
> Key: BEAM-5628
> URL: https://issues.apache.org/jira/browse/BEAM-5628
> Project: Beam
> Issue Type: Sub-task
> Components: sdk-py-core
> Reporter: Valentyn Tymofieiev
> Assignee: Simon
> Priority: Major
>
> ERROR: test_read_after_splitting (apache_beam.io.vcfio_test.VcfSourceTest)
> "
> ----------------------------------------------------------------------
> Traceback (most recent call last):
> File
> ""/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/io/vcfio_test.py"",
> line 336, in test_read_after_splitting
> ] split_records.extend(source_test_utils.read_from_source(*source_info))
> ] File
> ""/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/io/source_test_utils.py"",
> line 101, in read_from_source
> for value in reader:
> File
> ""/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/io/vcfio.py"",
> line 264, in read_records
> for line in record_iterator:
> File
> ""/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/io/vcfio.py"",
> line 330, in __next__
> record = next(self._vcf_reader)
> File
> ""/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/target/.tox/py3/lib/python3.5/site-packages/vcf/parser.py"",
> line 543, in __next__
> row = self._row_pattern.split(line.rstrip())
> TypeError: cannot use a string pattern on a bytes-like object
> "
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)