Hello Hendrik

Not very sure about the blastall from blast+ package, but the blastall
package from NCBI blast package only outputs the gi/accession number for all
the blast hits in -m 8 (table) format.

We have two ways to deal with this:
(1) run blastall and use -m 0, which defaults and output all the long names,
then use a script to parse the file into a table format. The parsing can be
really slow, we're using a bioperl script. this is how most of the people in
my lab does it.
(2) run blastall and use -m 8, which outputs the table format with short
names. Then, using a script, collect all gi/accession the significant blast
hits in a text file, use batch entrez to get the long names, then use
another script to replace the short names in the blast output file with the
long names. this is how i do it, again, my scripting is really bad so I
apologize for not sharing the script.

All the best,
-- yealing --


On Tue, May 11, 2010 at 3:16 PM, Henrik Lantz <henrik.la...@mikrob.slu.se>wrote:

> Hi
>
> A big thank you to Tim, Yealing and Alex for all the help!
>
> Here is what I have done so far: I got help from a friend to write a perl
> script that transforms the coordinates in the blast result file to
> cumulative coordinates. This file can then be read as a feature in Artemis
> just fine and maps perfectly onto the contigs.
>
> I also made some progress with the alternative discussed with Tim, i.e.,
> loading the contig file in Artemis and choosing "write all bases in FASTA
> format" and then blasting this file. I managed to avoid the memory related
> problems I had with blastall by using the blast+ package and the
> legacyscript included there. The blast results can then be read as a feature
> in Artemis without any problems.
>
> Of these two approaches I prefer the first one since this allows me to keep
> the original contigs.
>
> The "problem" I am having now is getting the names of the proteins into
> Artemis. The blast results only include the systematic name of the blast
> hits, not the full names in understandable English. I am sure I am just
> missing something very simple here, and will soon have a solution, but if
> anyone has a suggestion I would also be very interested in hearing about it.
> Still in the learning phase...
>
> Cheers,
> Henrik
> ________________________________________
> Från: artemis-users-boun...@sanger.ac.uk [
> artemis-users-boun...@sanger.ac.uk] f&#246;r Yealing [
> yealingt+arte...@gmail.com <yealingt%2barte...@gmail.com>]
> Skickat: den 29 april 2010 11:59
> Till: Tim Carver
> Kopia: artemis-users@sanger.ac.uk; Henrik Lantz
> Ämne: Re: [Artemis-users] Multiple contigs and Blastall-results
>
> Hi
>
> I've had the same problem in the past. There sometimes are reasons why we
> do not wish to join the contigs before we send for a blast, for example once
> we join the contigs, the other contig informations are lost.
>
> I've been able to map the blast results correctly in Artemis, but because
> I'm not a very good scripter, things might work or break using the bash-awk
> script that I have. So I usually end up doing it sort of manually for a lot
> of different people with different data sets. Here's how I do it:
>
> 1, we need a file containing <contig name> in the first column and <contig
> length> in the second column. The make or break problem is that in this
> file, the contig names must be arrange to the exact order as Artemis puts
> them.
> 2, from there, I use a script to generate a file with <contig name> in the
> first column and <cummulative contigs length at start of contig> in the
> second column. This can be done with excel too, if you're not familiar with
> scripting.
> 3, I then process the blastall result file, by modifying the start and stop
> coordinates (adding the cummulative contigs length at start of contig to the
> start and stop coordinate).
>
> After that, (usually if nothing bad happens) you will be able to just load
> the blastall result file in Artemis. Best thing is we get to keep all the
> details that we might or might not need.
>
> Cheers,
> -- yealing --
>
>
> On Thu, Apr 29, 2010 at 5:02 PM, Tim Carver <t...@sanger.ac.uk<mailto:
> t...@sanger.ac.uk>> wrote:
> Hi Henrik
>
> You have found the correct solution. It is just that the blast reports
> coordinates from the start of the sequence and so they need joining up
> before you do the blast. I am not sure why you get that memory error. If
> you
> have access to another machine you may want to try the blast there.
> Alternatively you could possibly try a smaller number of contigs and then
> use the EMBOSS application 'union' to join them back up.
>
> Regards
> Tim
>
>
> On 4/29/10 8:09 AM, "Henrik Lantz" <henrik.la...@mikrob.slu.se<mailto:
> henrik.la...@mikrob.slu.se>> wrote:
>
> > I was hoping I could get some help with a newbie Artemis question. Very
> new to
> > all this.
> >
> > I have made a denovo assembly of a fungus using MIRA and 454-data only.
> The
> > resulting fasta file with around 4000 contigs loads into Artemis fine. I
> can
> > check all the contigs, find ORFs etc. The problem appears when I want to
> > import the results of a blastall search on the contig-datafile. All
> > annotations from the blastall results are lumped into the first five
> contigs,
> > with the overwhelming majority in the first contig. Obviously not
> correct. I
> > am using the -m 8 flag for the blastall search. Looking through the
> resultfile
> > from blastall in a text editor I can see that the blastall search has
> worked,
> > and there are many interesting hits, but I would like to visualize the
> results
> > on the contigs.
> >
> > I read through the mail archive and found a user with a similar problem
> > (http://www.mail-archive.com/artemis-users%40sanger.ac.uk/msg00463.html)
> and
> > it seems one solution might be to save the contigs as a long continuous
> file
> > in Artemis, and then use that in the Blastall search. But when I try that
> I
> > get an error message from blastall:
> >
> > blastall(33748) malloc: *** mmap(size=1048576) failed (error code=12)
> > *** error: can't allocate region
> > *** set a breakpoint in malloc_error_break to debug
> > Bus error
> >
> > I am running on MacOSX Snow Leopard with 20 GBs of memory.
> > Any help to get an inexperienced user started would be very much
> appreciated!
> > /Henrik
> > _______________________________________________
> > Artemis-users mailing list
> > Artemis-users@sanger.ac.uk<mailto:Artemis-users@sanger.ac.uk>
> > http://lists.sanger.ac.uk/mailman/listinfo/artemis-users
>
>
>
> _______________________________________________
> Artemis-users mailing list
> Artemis-users@sanger.ac.uk<mailto:Artemis-users@sanger.ac.uk>
> http://lists.sanger.ac.uk/mailman/listinfo/artemis-users
>
>
_______________________________________________
Artemis-users mailing list
Artemis-users@sanger.ac.uk
http://lists.sanger.ac.uk/mailman/listinfo/artemis-users

Reply via email to