This is an automated email from the git hooks/post-receive script. daube-guest pushed a commit to branch master in repository debian-med-benchmarking-spec.git.
commit 2460dd3f52d8850a90fbd2adca1312badddc65c7 Author: Kevin Murray <[email protected]> Date: Fri Feb 5 15:22:29 2016 +0100 Updated spec with architechture --- benchmarking.md | 184 ++++++++++++++++++++++++++++++++++++++------------------ 1 file changed, 125 insertions(+), 59 deletions(-) diff --git a/benchmarking.md b/benchmarking.md index b49396e..94a3677 100644 --- a/benchmarking.md +++ b/benchmarking.md @@ -1,24 +1,118 @@ Benchmarking CI Service ======================= - Brainstorm of Debian Med/SEQwiki/biotools benchmarking service -This thing needs a name, ASAP! - -Possible datsets: - - https://sites.stanford.edu/abms/giab: Genome In A Bottle is a NIST human - NGS resequencing dataset (Paper: -http://biorxiv.org/content/early/2015/09/15/026468) - - http://gmisatest.referata.com/wiki/Dataset_1408MLGX6-3WGS - https://public.etherpad-mozilla.org/p/debian-med-benchmarking - -Similar to ReproducibleBuilds project -Machine that automatically defines a debian benchmark packages based on -metadata repository whenever: - - There is a new version of the underlying package in debian med unstable +**This thing needs a name, ASAP!** + + +Similar to ReproducibleBuilds project, but for scientific accuracy + +Architecture: +------------- + + - Pre-package and publish to separate local archive (benchmarking specific + code ONLY): + - Metric script .debs + - Real/published dataset .debs + - From public databases e.g. SRA or RefSeq + - *Not* for simulated datasets, their parameters are kept within CWL + files + - Benchmarking workflows are CWL workflows: + - Tests run in docker containers: + - Poll YAML or DB for build-deps and datasets + - Install tool and tool deps from ftp.debian.org/debian + - Install required evaluation/metric tools and dataset .debs from our own + archive + - Need a way to code these build-deps in a CWL file + - Create Dockerfile for the workflow? + - Or is a CWL workflow with dockerised tools enough? + - Run container of above image, producing data file results + - Workflow steps: + - Obtain dataset: + - Either: a) run simulator steps (log seeds), or b) install data + package from local repo per above + - Do any pre-conversion to tool input format (auto-detected from EDAM + formats) + - Run tool on data + - Run any post-conversion tools to evaluation code input format (also + auto-detected) + - Run evaluation code + - Report full result + - Probably as YAML file, or similar. + - Report "benchmark status" to UDD + - Simple state-based update (fail, got worse, all ok) + - Report biggest change in metric (potentially biggest improvement and + regression) for all builds + - There may be many tools per package, report best/worst across all + tools + - Path to and checksum of tarball of all benchmark results + - Will need a new UDD table + - Docker images + - Could use CWL tool Docker containers thru CWL workflow + - Don't always use Debian, unfortunately + - Could run whole workflow within Debian Unstable docker container + - But still run CWL workflow within the container + - Debugging/reproducible containers + - auto-generated `Dockerfile` for image containing all datasets, metrics, + conversion tools, and `RUN` steps for obtaining data and running + pre-conversion (but stopping before tool execution. + - If we use docker for actual workflow execution, then this is what would + be used for the test execution + - Could all this run on a new instance of debci? or Jenkins? + +Requirements +------------ + + - Have an EDAM-compatible DebTags + - Have a CWL tool description for each tool in the package + - Should contain EDAM tags per operation + - Potentially one CWL tool file per subtool/operation (e.g. `samtools view` + vs `samtools sort`) + - Be in `main` + +Operation +--------- + + - New service that runs benchmarks when: + - There is a new version of the underlying package in Debian med unstable + - Including any conversion utilities - There is a change in an applicable script for the calculation of metrics - There is a change in an applicable benchmark dataset + - There is an applicable transition in progress?? + - Could catch subtle bugs e.g. py3.4 -> py3.5 issues + +Schema of ideas +--------------- + + - There may be many tools per package + - Each tool may have many benchmarkable operations + - Each operation of the tool should be tested by many datasets + - Each test should (or may) report more than one metric + + +Possible datasets: +------------------ + + - https://sites.stanford.edu/abms/giab: Genome In A Bottle is a NIST human + NGS resequencing dataset (Paper: +tp://biorxiv.org/content/early/2015/09/15/026468) + - http://gmisatest.referata.com/wiki/Dataset_1408MLGX6-3WGS +https://public.etherpad-mozilla.org/p/debian-med-benchmarking + + + + + + + + + + + + +`cat brain | less` +------------------ The EDAM classification of a tool lives in the DebTags, and the CWL description of a tool lives within the debian med package @@ -50,56 +144,28 @@ file format to the metric calculation script's input file format Autopkgtest? - - It may be worth investigating using autopkgtest infrastruture (perhaps - run the service as a debci instance) that runs autopakcage tests: - - - each benchmark package contains a test (or tests) in autopkgtest format, - that we parse & use on our debci + - It may be worth investigating using autopkgtest infrastruture (perhaps + run the service as a debci instance) that runs autopakcage tests: + - each benchmark package contains a test (or tests) in autopkgtest format, + that we parse & use on our debci Metadata storage: - - Repository of YAML-style markup parsed into SQL? - - Debtags? - - Just write a script? - - -UDD: - - - https://udd.debian.org/dmd/?email1=debian-med-packaging%40lists.alioth.debian.org&email2=&email3=&packages=&ignpackages=&format=html#todo - - Or, more sanely: - https://udd.debian.org/dmd/?email1=spam%40kdmurray.id.au&email2=&email3=&packages=&ignpackages=&format=html#todo + - Repository of YAML-style markup parsed into SQL? + - Debtags? + - Just write a script? - - New table required for debian benchmarks status data - - Errors in building a test or major changes in a metric are reported to - UDD in a fashion similar to how it is done by the ReproducibleBuilds -system: We report whether the test could be computed at all and the largest -positive and negative deviations of scores (in any dataset on any metric), plus -a description of in which dataset and on which metric this deviation has -occurred - - - -Architechture: - - Pre-package and publish to separate local archive (benchmarking specific - code ONLY): - - Metric script .debs - - dataset .debs - -Tests run in docker containers: - - Poll YAML or DB for build-deps and datasets - - - Install tool and tool deps from ftp.debian.org/debian - - - Install required evaluation/metric tools and dataset .debs from our own - archive +UDD: - - Create dockerfile for image from above (saved and published for every - benchmark) - - Run container of above image, producing data file results - - Run evaluation code and report result - - Delete container and image (keeping Dockerfile) - - Publish result (either text file [TSV, CSV or YAML], or cgi script to - pull from DB) - - Buider pushes status to UDD + - [debmed's page](https://udd.debian.org/dmd/?email1=debian-med-packaging%40lists.alioth.debian.org&email2=&email3=&packages=&ignpackages=&format=html#todo) + - Or, a more sane example + [KDM](https://udd.debian.org/dmd/?email1=spam%40kdmurray.id.au&email2=&email3=&packages=&ignpackages=&format=html#todo) + - New table required for debian benchmarks status data + - Errors in building a test or major changes in a metric are reported to + UDD in a fashion similar to how it is done by the ReproducibleBuilds system: +We report whether the test could be computed at all and the largest positive +and negative deviations of scores (in any dataset on any metric), plus a +description of in which dataset and on which metric this deviation has occurred -- Alioth's /usr/local/bin/git-commit-notice on /srv/git.debian.org/git/debian-med/debian-med-benchmarking-spec.git.git _______________________________________________ debian-med-commit mailing list [email protected] http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/debian-med-commit
