A joint project between the SBGrid Consortium at Harvard Medical School and
the Dataverse Team at the Institute for Quantitative Social Science at
Harvard University has an immediate opening for a developer to help us
build a next generation data publication system for large biomedical
datasets.

We aim to make biomedical datasets publicly available through a federated
data grid to facilitate access, citation, and data analysis by scientists.
Our pilot collection includes datasets generated using X-ray
crystallography, computer modeling, lattice light sheet microscopy, and
microED diffraction. This collection is currently replicated to computing
centers in the US, Europe, Asia, and South America. The project is
supported by the Helmsley Charitable Trust and was recently selected as a
pilot of the U.S. National Data Service. To learn more about the
environment, please visit our current implementation at data.sbgrid.org and
our group websites at sbgrid.org, slizlab.org, and dataverse.org.

The data science engineer will be embedded within the Dataverse development
team and will primarily be focused on implementing the features necessary
for the successful completion of this project. Examples of features that
must be added to Dataverse include implementation of APIs for
interoperation with components for large (~100 GB) datasets, automatic data
validation pipelines, custom publishing workflows, and other features
relevant to specific biomedical data types. All new functionality developed
under this project will be merged into the Dataverse open source project
and shared with the community.

As a member of our team, this person can expect to collaborate with
researchers, collection specialists, and present outcomes of the project at
meetings and conferences.

Advanced degree (computer science, bioinformatics or engineering preferred)
and 3-5 years of strong programming experience is strongly preferred,
preferably in Java and Python, ideally in the context of web applications.

Our team will welcome candidates with diverse technical backgrounds, but
the successful candidate will have experience handling large datasets and
working as a part of an agile software development team. A working
knowledge of Linux, shell scripting, databases, and distributed version
control systems (git, mercurial, etc) is also necessary. The ideal
candidate will also be familiar with data management software and the
handling and analysis of large datasets.

This is a term appointment ending on September 30, 2018. To apply, e mail
[email protected].

Reply via email to