[
https://issues.apache.org/jira/browse/BEAM-5844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16662478#comment-16662478
]
Valentyn Tymofieiev commented on BEAM-5844:
-------------------------------------------
Thanks for filing the issue, glad to hear about the upcoming switch to Nucleus.
Let's make sure unit tests for VCF IO pass in Python 3 once we finish the
migration, before closing this issue. Thank you!
> Transition VCF IO to use Nucleus
> --------------------------------
>
> Key: BEAM-5844
> URL: https://issues.apache.org/jira/browse/BEAM-5844
> Project: Beam
> Issue Type: Task
> Components: sdk-py-core
> Reporter: Asha Rostamianfar
> Assignee: Asha Rostamianfar
> Priority: Minor
>
> Currently, vcfio.py uses [PyVCF|https://github.com/jamescasbon/PyVCF] as its
> parser. Even though it's one of the popular VCF parsers, it is not actively
> maintained. There are also python3 compatibility issues (see BEAM-5628).
> There is a new FOSS parser from the Google Brain team, called
> [Nucleus|https://github.com/google/nucleus], that we can use instead. It has
> other nice features like built-in protocol buffer support so that we no
> longer need to transform the internal structures into Variant objects (we can
> deprecate the existing Variant/VariantCall classes in favor of using the
> protos).
> The Google Cloud Healthcare & Life Sciences team is planning to switch to
> using Nucleus as its parser for the [Variant
> Transforms|https://github.com/googlegenomics/gcp-variant-transforms] tool.
> Once that is done, we'll sync the [vcfio.py
> code|https://github.com/googlegenomics/gcp-variant-transforms/blob/master/gcp_variant_transforms/beam_io/vcfio.py]
> back to the Beam SDK so that the wider community can use it as well
> (potentially with additional features, like ReadAllFromVCF and VCF sink).
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)