Asha Rostamianfar created BEAM-5844:
---------------------------------------

             Summary: Transition VCF IO to use Nucleus
                 Key: BEAM-5844
                 URL: https://issues.apache.org/jira/browse/BEAM-5844
             Project: Beam
          Issue Type: Task
          Components: sdk-py-core
            Reporter: Asha Rostamianfar
            Assignee: Asha Rostamianfar


Currently, vcfio.py uses [PyVCF|https://github.com/jamescasbon/PyVCF] as its 
parser. Even though it's one of the popular VCF parsers, it is not actively 
maintained. There are also python3 compatibility issues (see BEAM-5628). There 
is a new FOSS parser from the Google Brain team, called 
[Nucleus|https://github.com/google/nucleus], that we can use instead. It has 
other nice features like built-in protocol buffer support so that we no longer 
need to transform the internal structures into Variant objects (we can 
deprecate the existing Variant/VariantCall classes in favor of using the 
protos).

The Google Cloud Healthcare & Life Sciences team is planning to switch to using 
Nucleus as its parser for the [Variant 
Transforms|https://github.com/googlegenomics/gcp-variant-transforms] tool. Once 
that is done, we'll sync the [vcfio.py 
code|https://github.com/googlegenomics/gcp-variant-transforms/blob/master/gcp_variant_transforms/beam_io/vcfio.py]
 back to the Beam SDK so that the wider community can use it as well 
(potentially with additional features, like ReadAllFromVCF and VCF sink).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to