Hello Anto,
There is no specific tool that I know of to do this based off read
content, but you could use the very low quality score (2) assigned to
ambiguous bases and the tool 'Filter by quality' to do a filter by
percentage. Be aware that other bases may have scores assigned to this
lower value, but these would very likely not be of practical usage anyway.
You could clip these end first, then do the filter, discarding any that
have very short usable sequence left. If the data is Illumina, is likely
a sign of a sequence that failed vendor quality checks, and these are no
longer removed by default as of Casava 1.8+.
Creating regular expression with the Select tool is another option, but
this probably more effort than it is worth to construct. But, your
choice. A google will bring up syntax advice.
Ideally the first will do the job,
Jen
Galaxy team
On 7/29/13 3:17 AM, Anto Praveen Rajkumar Rajamani wrote:
Hello,
I like to filter my fastq files (50 bp single end Illumina RNA seq
reads) by a maximum threshold (10%) of ambiguous (N) bases.
I can see that the "CLIP" tool removes all reads with one or more N
bases.
Is there a way to remove only the reads with five or more N bases
using Galaxy?
Thank you.
Best wishes,
Anto
___________________________________________________________
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org. Please keep all replies on the list by
using "reply all" in your mail client. For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:
http://lists.bx.psu.edu/listinfo/galaxy-dev
To manage your subscriptions to this and other Galaxy lists,
please use the interface at:
http://lists.bx.psu.edu/
To search Galaxy mailing lists use the unified search at:
http://galaxyproject.org/search/mailinglists/
--
Jennifer Hillman-Jackson
Galaxy Support and Training
http://galaxyproject.org
___________________________________________________________
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org. Please keep all replies on the list by
using "reply all" in your mail client. For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:
http://lists.bx.psu.edu/listinfo/galaxy-dev
To manage your subscriptions to this and other Galaxy lists,
please use the interface at:
http://lists.bx.psu.edu/
To search Galaxy mailing lists use the unified search at:
http://galaxyproject.org/search/mailinglists/