Henrik,
yealing's second option is also how I do it. If you have local blast its not 
that bad.....
Alex
 

________________________________

Van: artemis-users-boun...@sanger.ac.uk namens Yealing
Verzonden: di 11-5-2010 9:57
Aan: Henrik Lantz
CC: artemis-users@sanger.ac.uk
Onderwerp: Re: [Artemis-users] Multiple contigs and Blastall-results


Hello Hendrik 

Not very sure about the blastall from blast+ package, but the blastall package 
from NCBI blast package only outputs the gi/accession number for all the blast 
hits in -m 8 (table) format.

We have two ways to deal with this:
(1) run blastall and use -m 0, which defaults and output all the long names, 
then use a script to parse the file into a table format. The parsing can be 
really slow, we're using a bioperl script. this is how most of the people in my 
lab does it.
(2) run blastall and use -m 8, which outputs the table format with short names. 
Then, using a script, collect all gi/accession the significant blast hits in a 
text file, use batch entrez to get the long names, then use another script to 
replace the short names in the blast output file with the long names. this is 
how i do it, again, my scripting is really bad so I apologize for not sharing 
the script.

All the best,
-- yealing --



On Tue, May 11, 2010 at 3:16 PM, Henrik Lantz <henrik.la...@mikrob.slu.se> 
wrote:


        Hi
        
        A big thank you to Tim, Yealing and Alex for all the help!
        
        Here is what I have done so far: I got help from a friend to write a 
perl script that transforms the coordinates in the blast result file to 
cumulative coordinates. This file can then be read as a feature in Artemis just 
fine and maps perfectly onto the contigs.
        
        I also made some progress with the alternative discussed with Tim, 
i.e., loading the contig file in Artemis and choosing "write all bases in FASTA 
format" and then blasting this file. I managed to avoid the memory related 
problems I had with blastall by using the blast+ package and the legacyscript 
included there. The blast results can then be read as a feature in Artemis 
without any problems.
        
        Of these two approaches I prefer the first one since this allows me to 
keep the original contigs.
        
        The "problem" I am having now is getting the names of the proteins into 
Artemis. The blast results only include the systematic name of the blast hits, 
not the full names in understandable English. I am sure I am just missing 
something very simple here, and will soon have a solution, but if anyone has a 
suggestion I would also be very interested in hearing about it. Still in the 
learning phase...
        
        Cheers,
        Henrik
        ________________________________________
        Från: artemis-users-boun...@sanger.ac.uk 
[artemis-users-boun...@sanger.ac.uk] f&#246;r Yealing 
[yealingt+arte...@gmail.com <mailto:yealingt%2barte...@gmail.com> ]
        Skickat: den 29 april 2010 11:59
        Till: Tim Carver
        Kopia: artemis-users@sanger.ac.uk; Henrik Lantz
        Ämne: Re: [Artemis-users] Multiple contigs and Blastall-results
        

        Hi
        
        I've had the same problem in the past. There sometimes are reasons why 
we do not wish to join the contigs before we send for a blast, for example once 
we join the contigs, the other contig informations are lost.
        
        I've been able to map the blast results correctly in Artemis, but 
because I'm not a very good scripter, things might work or break using the 
bash-awk script that I have. So I usually end up doing it sort of manually for 
a lot of different people with different data sets. Here's how I do it:
        
        1, we need a file containing <contig name> in the first column and 
<contig length> in the second column. The make or break problem is that in this 
file, the contig names must be arrange to the exact order as Artemis puts them.
        2, from there, I use a script to generate a file with <contig name> in 
the first column and <cummulative contigs length at start of contig> in the 
second column. This can be done with excel too, if you're not familiar with 
scripting.
        3, I then process the blastall result file, by modifying the start and 
stop coordinates (adding the cummulative contigs length at start of contig to 
the start and stop coordinate).
        
        After that, (usually if nothing bad happens) you will be able to just 
load the blastall result file in Artemis. Best thing is we get to keep all the 
details that we might or might not need.
        
        Cheers,
        -- yealing --
        
        
        
        On Thu, Apr 29, 2010 at 5:02 PM, Tim Carver 
<t...@sanger.ac.uk<mailto:t...@sanger.ac.uk>> wrote:
        Hi Henrik
        
        You have found the correct solution. It is just that the blast reports
        coordinates from the start of the sequence and so they need joining up
        before you do the blast. I am not sure why you get that memory error. 
If you
        have access to another machine you may want to try the blast there.
        Alternatively you could possibly try a smaller number of contigs and 
then
        use the EMBOSS application 'union' to join them back up.
        
        Regards
        Tim
        
        
        
        On 4/29/10 8:09 AM, "Henrik Lantz" 
<henrik.la...@mikrob.slu.se<mailto:henrik.la...@mikrob.slu.se>> wrote:
        
        > I was hoping I could get some help with a newbie Artemis question. 
Very new to
        > all this.
        >
        > I have made a denovo assembly of a fungus using MIRA and 454-data 
only. The
        > resulting fasta file with around 4000 contigs loads into Artemis 
fine. I can
        > check all the contigs, find ORFs etc. The problem appears when I want 
to
        > import the results of a blastall search on the contig-datafile. All
        > annotations from the blastall results are lumped into the first five 
contigs,
        > with the overwhelming majority in the first contig. Obviously not 
correct. I
        > am using the -m 8 flag for the blastall search. Looking through the 
resultfile
        > from blastall in a text editor I can see that the blastall search has 
worked,
        > and there are many interesting hits, but I would like to visualize 
the results
        > on the contigs.
        >
        > I read through the mail archive and found a user with a similar 
problem
        > 
(http://www.mail-archive.com/artemis-users%40sanger.ac.uk/msg00463.html) and
        > it seems one solution might be to save the contigs as a long 
continuous file
        > in Artemis, and then use that in the Blastall search. But when I try 
that I
        > get an error message from blastall:
        >
        > blastall(33748) malloc: *** mmap(size=1048576) failed (error code=12)
        > *** error: can't allocate region
        > *** set a breakpoint in malloc_error_break to debug
        > Bus error
        >
        > I am running on MacOSX Snow Leopard with 20 GBs of memory.
        > Any help to get an inexperienced user started would be very much 
appreciated!
        > /Henrik
        > _______________________________________________
        > Artemis-users mailing list
        
        > Artemis-users@sanger.ac.uk<mailto:Artemis-users@sanger.ac.uk>
        
        > http://lists.sanger.ac.uk/mailman/listinfo/artemis-users
        
        
        
        _______________________________________________
        Artemis-users mailing list
        
        Artemis-users@sanger.ac.uk<mailto:Artemis-users@sanger.ac.uk>
        
        http://lists.sanger.ac.uk/mailman/listinfo/artemis-users
        
        




_______________________________________________
Artemis-users mailing list
Artemis-users@sanger.ac.uk
http://lists.sanger.ac.uk/mailman/listinfo/artemis-users

Reply via email to