[
https://issues.apache.org/jira/browse/BEAM-2774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16200976#comment-16200976
]
ASF GitHub Bot commented on BEAM-2774:
--------------------------------------
GitHub user mhsaul opened a pull request:
https://github.com/apache/beam/pull/3979
[BEAM-2774] Add I/O source to read VCF files
Added I/O transform, `ReadFromVcf`, to read VCF files into a `PCollection`
of `Variant` objects. Modified `TextSource` to be able to process file headers
to be used for VCF files.
Design Doc:
https://docs.google.com/document/d/1jsdxOPALYYlhnww2NLURS8NKXaFyRSJrcGbEDpY9Lkw/edit
CC: @arostamianfar @chamikaramj @aaltay
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/mhsaul/beam miles_saul--vsf-io-source
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/beam/pull/3979.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #3979
----
commit 3f699fcd286c8509cfc404d7c2bec35fd6342347
Author: Miles Saul <[email protected]>
Date: 2017-10-11T19:00:03Z
Added vcf file io source and modified _TextSource to optionally handle
headers
----
> Add I/O source for VCF files (python)
> -------------------------------------
>
> Key: BEAM-2774
> URL: https://issues.apache.org/jira/browse/BEAM-2774
> Project: Beam
> Issue Type: New Feature
> Components: sdk-py-core
> Reporter: Asha Rostamianfar
> Assignee: Miles Saul
> Original Estimate: 336h
> Remaining Estimate: 336h
>
> A new I/O source for reading (and eventually writing) VCF files [1] for
> Python. The design doc is available at
> https://docs.google.com/document/d/1jsdxOPALYYlhnww2NLURS8NKXaFyRSJrcGbEDpY9Lkw/edit
> [1] http://samtools.github.io/hts-specs/VCFv4.3.pdf
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)