Hello Sam,
When running Megablast, filtering by identity or evalue can help reduce
the hits (the default values are all fairly permissive, if you are
performing the query vs the same species target genome and the query has
been filtered for base calling quality). Filtering out low-complexity
would also be a big help, as a guess, considering the number of hits
generated from your initial data.
There is also the "Parse blast XML output" tool. Modifying the data into
interval format would allow the use of the "Operate on Genomic Intervals
-> Cluster the intervals of a dataset". This is based on coverage, if
that is one of your criteria (could be, if the threshold for identity is
a range you consider to be candidate choices for "best"). Identity &
coverage are commonly combined to identify "best", but this is just a
suggestion. The same type of logic could be used with top scoring evalue
matches combined with coverage (would likely be similar as using evalue
alone, if the identity is set to be high).
The idea to add a filter for "single best" is a good one, but has some
complexity associated with it. I will pass it along to the team as an
enhancement request to consider.
Hopefully this helps!
Jen
Galaxy team
On 4/11/11 1:43 PM, Hsin-l (Sam) Chiang wrote:
Hi,
I used the Megablast function (in the NGS: Mapping\ROCHE-454\) to
analyze my FASTA sequences against nt database and it worked fine for
me. However, it generated 56,804 hits although my query has only 1000
sequences. I am wondering is there any way to suppress the number of
reported alignments to just one best hit per sequence? (In the local
BLAST there are parameters such as -K1 -v 1 -b 1 to do so, but I can't
find similar options in Galaxy).
Many thanks!
Sam
___________________________________________________________
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org. Please keep all replies on the list by
using "reply all" in your mail client. For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:
http://lists.bx.psu.edu/listinfo/galaxy-dev
To manage your subscriptions to this and other Galaxy lists,
please use the interface at:
http://lists.bx.psu.edu/
--
Jennifer Jackson
http://usegalaxy.org
http://galaxyproject.org
___________________________________________________________
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org. Please keep all replies on the list by
using "reply all" in your mail client. For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:
http://lists.bx.psu.edu/listinfo/galaxy-dev
To manage your subscriptions to this and other Galaxy lists,
please use the interface at:
http://lists.bx.psu.edu/