As you state in your email:
     [formatdb] ERROR: Failed to create index.  Possibly a gi included more 
than once in the database.


while untested, a command such as

grep '^>' fastafile.txt | cat -d'|' -f2 | sort | uniq -c | sort -n | grep -v " 
1 "

may allow you to determine which GI is occuring more than once. Remove this GI 
from your fasta file





I've previously used this code to skip duplicate GIs. A small amount of 
modification will be necessary.

use strict;
use Getopt::Long;
use Data::Dumper;
use Bio::SeqIO;
use FileHandle;

use figur_config qw($PRIMARY_TMPDIR);
my ($quick_run, $help);
my %optControl = (
          'quick'            => \$quick_run,
          'help'             => \$help,
          );
my $res = GetOptions(%optControl);

if ($help) {
    print usage();
    exit;
}

my $infn = $ARGV[0] || '-';
my $outfn = $ARGV[1] || '-';

my $inh = FileHandle->new($infn) or warn ("$!:$infn");
my $outh = FileHandle->new('>'.$outfn) or warn ("$!:$outfn");


my $seenvec = '';

my $line;
my $show = 1;
while ($line = <$inh>) {
    if ($line =~ /^>/) {
    my ($gi) = $line =~ /gi\|(\d+)/;

    if (0==vec($seenvec, $gi, 1)) {
        $show = 1;
        vec($seenvec, $gi, 1) = 1;
    } else {
        $show = 0;
        print STDERR "skip $line";
    }

    }
    if ($show) {
    print $outh $line;
    }
}






Since nt is so large, you may also want to consider adding the switch 
--skip-reorder to your mpiformatdb call. 
Does anyone else feel this should become a default?





 
--
Mike Cariaso * Bioinformatics Software * http://www.cariaso.com

----- Original Message ----
From: Daniel Xavier de Sousa <[EMAIL PROTECTED]>
To: mpiBlast <[email protected]>
Sent: Monday, May 21, 2007 8:34:10 AM
Subject: [Mpiblast-users] MPIFORMATDB

Hi,

I have a problem with MPIFORMATDB.

I have put this command: ./mpiformatdb -N 24 -i nt -p F

And the program returned this out:

/////////////////////////////////////////////////////////////////////////////// 
OUT OF MPIBLAST
nt: Value too large for defined data type
Reading input file
0%Done, read 263843064 lines
Temp name base: /tmp/reorderXXXXXX
Got temp name: /tmp/reorderb0538C
Reordering 5214551 sequence entries
2.58%

...

[formatdb] WARNING: Sequence number 5051632 (gi|4704323|dbj|AB013452.1|), 15 
illegal characters were removed:
2 Es, 1 F, 4 Is, 3 Ls, 3 Os, 2 Ps

[formatdb] WARNING: Sequence number 5052135 (gi|640099|pdb|172D|D), 32 illegal 
characters were removed:
4 Es, 3 Fs, 9 Is, 3 Ls, 7 Os, 3 Ps, 3 -s

[formatdb]
 WARNING: Sequence number 5052365 (gi|999772), 31 illegal characters were 
removed:
5 Es, 1 F, 9 Is, 6 Ls, 5 Os, 5 Ps

[formatdb] ERROR: Failed to create index.  Possibly a gi included more than 
once in the database.

Removed /tmp/reorder9UjHhX
There was an error executing formatdb.  Check formatdb.log
///////////////////////////////////////////////////////////////////////////////

To test I broke the nt file in 24 parts and executed each one and I did'nt find 
nothing of error.This error is happening because de size of NT to be 20 GB? 
What I have to do for resolve this error?

Thanks
Daniel Sousa

*****************************************************************
*        Daniel Xavier de Sousa                    *
*        Mestrando em Informática - PUC-Rio        *
*        E-MAIL : dsousaARROBAinf.puc-rio.br       *
*        Fone   : +55 21 35271500 - 4543          
 *
****************************************************************



__________________________________________________
Fale com seus amigos  de graça com o novo Yahoo! Messenger 
http://br.messenger.yahoo.com/ 
-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
Mpiblast-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/mpiblast-users




-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
Mpiblast-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/mpiblast-users

Reply via email to