Hi folks,
I'm back with one more feeler to gauge interest in the approach we are
trying out for a Galaxy quality control tool to interject into existing
bioinformatics pipelines.  With some nudging (thanks Bob) I've implemented
basic infix math expressions.  As well we're trying out the inclusion of
ontology metadata within report data to encourage data
import/export/comparison.  The goal is to make it easy to see and change
quality control metrics (without having to recompile code or modify Galaxy
workflow mechanics.)

The QC scripting language/interpreter as a Galaxy tool lets us read in
text file(s) - some assembly contig data say - and then run a program (a
set of rules) like:

   store( 200 report/contigs/contig_count_QC_threshold )
   store( 200000 report/contigs/contig_N50_QC_threshold )
   store( 2000 report/contigs/contig_N99_QC_threshold )

   if( ( genome_size_ratio > genome_size_ratio_QC_threshold ) fail(qc
"Failed genome size ratio threshold") )
   store( statisticN( contig_lengths 50 ) report/contigs/contig_N50 )
   store( statisticN( contig_lengths 99 ) report/contigs/contig_N99 )
   if( contig_N50 < contig_N50_QC_threshold fail(qc "Failed minimum N50
contig length threshold")  )
   if( contig_N99 < contig_N99_QC_threshold fail(qc "Failed minimum N99
contig length threshold")  )
   if( report/contigs/contigs_count > contig_count_QC_threshold fail(job
"Failed minimum contig count threshold" ) )
 

Which is like a generic, basic function(parameter1 parameter2...) type of
language.

On a good run this yields a JSON report like:

{
   "title": "RCQC Quality Control Report",
   "tool_version": "0.0.7",
   "job": {
      "status": "ok"
   },
   "quality_control": {
      "status": "ok"
   },
   "date": "2016-02-09 09:21",
   "contigs": {
      "contig_lengths": [ 128, 172, 221, 224, 238, 230, 240, 246, 407, ...
, 242, 2284, 1506],
      "genome_size_ratio_QC_threshold": 0.10000000000000001,
      "contig_N99_QC_threshold": 2000,
      "assembly_genome_size": 4615592,
      "genome_size_ratio": 0.04970310891496809,
      "contig_N50": 427122,
      "contig_N99": 8542,
      "contig_count_QC_threshold": 200,
      "contig_count": 44,
      "reference_genome_identifier": "serovar Typhimurium LT2",
      "reference_genome_size": 4857000,
      "contig_N50_QC_threshold": 200000
   },
   "@context": {
      "contigs": "http://purl.obolibrary.org/obo/SO_0001462";,
      "genome_size_ratio_QC_threshold":
"http://purl.obolibrary.org/obo/GenEpiO_0001564";,
      "contig_N99_QC_threshold":
"http://purl.obolibrary.org/obo/GenEpiO_0001566";,
      "assembly_genome_size":
"http://purl.obolibrary.org/obo/GenEpiO_0001561";,
      "genome_size_ratio":
"http://purl.obolibrary.org/obo/GenEpiO_0001563";,
      "contig_N50": "http://purl.obolibrary.org/obo/OBI_0001941";,
      "contig_N99": "http://purl.obolibrary.org/obo/GenEpiO_0001570";,
      "contig_count_QC_threshold":
"http://purl.obolibrary.org/obo/GenEpiO_0001571";,
      "contig_count": "http://purl.obolibrary.org/obo/GenEpiO_0000093";,
      "reference_genome_identifier":
"http://purl.obolibrary.org/obo/GenEpiO_0001562";,
      "date": "http://purl.obolibrary.org/obo/IAO_0000416";,
      "reference_genome_size":
"http://purl.obolibrary.org/obo/GenEpiO_0001560";,
      "contig_N50_QC_threshold":
"http://purl.obolibrary.org/obo/GenEpiO_0001565";
   }
}


So we'd appreciate any advice on roadblocks or desired features you
perceive on this...

- Damion
>

___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  https://lists.galaxyproject.org/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/

Reply via email to