Hi Jianguang,
I agree - already Sanger Phred +33 offset quality scores, meaning you
want datatype .fastqsanger (with near certainty). To double check, take
a sample and run "FastQC" on it to be exact, or run this tool on the
entire dataset if you plan on doing quality checks anyway (potential
trimming, etc).
You also don't need to run the groomer - just assign the datatype by
clicking on the pencil icon. Help is here and the screencast FASTQ Prep
walks through a how-to (using SRA data as an example):
http://wiki.galaxyproject.org/Support#Dataset_special_cases
Hope this helps - but you are really already on the right track, I'm
just agreeing!
Jen
Galaxy
On 8/29/13 12:53 PM, Du, Jianguang wrote:
Hi All,
I downloaded some RNA-seq datasets from NCBI. The datasets were
generated by Illumina Hiseq 2000. I am not sure which "Input FASTQ
quality scores type" I should choose when run FASTQ Groomer. Below
shows the scores of 2 reads of a dataset, I renamed them as "read 1"
and "read 2".
1) Sequence and quality score displayed in Galaxy
@read 1 length=51
NTGAGATTCTTGACTAGTTATTTCTGCTTTCAGGGAAGAAATCAGCTGGGC
+read 1 length=51
#1=ADADEHHHHHIIGIHJGJJJHJIIJJJH@HEGBFH;FHEH>@HIJJJJ
@read 2 length=51
NGAAGAGTCAGTTTTTTGTTTCCCTCATAACTTGCTAGATTCCGGATTGCT
+read 2 length=51
#1=DDDEDHHFHHJJJJJIJJHIIIJJJIJJJJJJJIJIJJJJJJIJJJJI
2)
Sequence and one chanel quality score shown in SRA of NCBI when I
downloaded the dataset.
>gnl|SRA|read 1
NTGAGATTCTTGACTAGTTATTTCTGCTTTCAGGGAAGAAATCAGCTGGGC
One channel quality score
2 16 28 32 35 32 35 36 39 39 39 39 39 40 40 38 40 39 41 38 41 41 41 39
41 40 40 41 41 41 39 31 39 36 38 33 37 39 26 37 39 36 39 29 31 39 40
41 41 41 41
>gnl|SRA|read 2
NGAAGAGTCAGTTTTTTGTTTCCCTCATAACTTGCTAGATTCCGGATTGCT
One channel quality score
2 16 28 35 35 35 36 35 39 39 37 39 39 41 41 41 41 41 40 41 41 39 40 40
40 41 41 41 40 41 41 41 41 41 41 41 40 41 40 41 41 41 41 41 41 40 41
41 41 41 40
Looks like the dataset is generated by illumina that is later than
version 1.8 because some of the reads are at score quality of 41. Can
I choose "sanger" as "Input FASTQ quality scores type" when I run
FASTQ Groomer?
Thanks.
Jianguang Du
___________________________________________________________
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org. Please keep all replies on the list by
using "reply all" in your mail client. For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:
http://lists.bx.psu.edu/listinfo/galaxy-dev
To manage your subscriptions to this and other Galaxy lists,
please use the interface at:
http://lists.bx.psu.edu/
To search Galaxy mailing lists use the unified search at:
http://galaxyproject.org/search/mailinglists/
--
Jennifer Hillman-Jackson
http://galaxyproject.org
___________________________________________________________
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org. Please keep all replies on the list by
using "reply all" in your mail client. For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:
http://lists.bx.psu.edu/listinfo/galaxy-dev
To manage your subscriptions to this and other Galaxy lists,
please use the interface at:
http://lists.bx.psu.edu/
To search Galaxy mailing lists use the unified search at:
http://galaxyproject.org/search/mailinglists/