As you state in your email:
[formatdb] ERROR: Failed to create index. Possibly a gi included more
than once in the database.
while untested, a command such as
grep '^>' fastafile.txt | cat -d'|' -f2 | sort | uniq -c | sort -n | grep -v "
1 "
may allow you to determine which GI is occuring more than once. Remove this GI
from your fasta file
I've previously used this code to skip duplicate GIs. A small amount of
modification will be necessary.
use strict;
use Getopt::Long;
use Data::Dumper;
use Bio::SeqIO;
use FileHandle;
use figur_config qw($PRIMARY_TMPDIR);
my ($quick_run, $help);
my %optControl = (
'quick' => \$quick_run,
'help' => \$help,
);
my $res = GetOptions(%optControl);
if ($help) {
print usage();
exit;
}
my $infn = $ARGV[0] || '-';
my $outfn = $ARGV[1] || '-';
my $inh = FileHandle->new($infn) or warn ("$!:$infn");
my $outh = FileHandle->new('>'.$outfn) or warn ("$!:$outfn");
my $seenvec = '';
my $line;
my $show = 1;
while ($line = <$inh>) {
if ($line =~ /^>/) {
my ($gi) = $line =~ /gi\|(\d+)/;
if (0==vec($seenvec, $gi, 1)) {
$show = 1;
vec($seenvec, $gi, 1) = 1;
} else {
$show = 0;
print STDERR "skip $line";
}
}
if ($show) {
print $outh $line;
}
}
Since nt is so large, you may also want to consider adding the switch
--skip-reorder to your mpiformatdb call.
Does anyone else feel this should become a default?
--
Mike Cariaso * Bioinformatics Software * http://www.cariaso.com
----- Original Message ----
From: Daniel Xavier de Sousa <[EMAIL PROTECTED]>
To: mpiBlast <[email protected]>
Sent: Monday, May 21, 2007 8:34:10 AM
Subject: [Mpiblast-users] MPIFORMATDB
Hi,
I have a problem with MPIFORMATDB.
I have put this command: ./mpiformatdb -N 24 -i nt -p F
And the program returned this out:
///////////////////////////////////////////////////////////////////////////////
OUT OF MPIBLAST
nt: Value too large for defined data type
Reading input file
0%Done, read 263843064 lines
Temp name base: /tmp/reorderXXXXXX
Got temp name: /tmp/reorderb0538C
Reordering 5214551 sequence entries
2.58%
...
[formatdb] WARNING: Sequence number 5051632 (gi|4704323|dbj|AB013452.1|), 15
illegal characters were removed:
2 Es, 1 F, 4 Is, 3 Ls, 3 Os, 2 Ps
[formatdb] WARNING: Sequence number 5052135 (gi|640099|pdb|172D|D), 32 illegal
characters were removed:
4 Es, 3 Fs, 9 Is, 3 Ls, 7 Os, 3 Ps, 3 -s
[formatdb]
WARNING: Sequence number 5052365 (gi|999772), 31 illegal characters were
removed:
5 Es, 1 F, 9 Is, 6 Ls, 5 Os, 5 Ps
[formatdb] ERROR: Failed to create index. Possibly a gi included more than
once in the database.
Removed /tmp/reorder9UjHhX
There was an error executing formatdb. Check formatdb.log
///////////////////////////////////////////////////////////////////////////////
To test I broke the nt file in 24 parts and executed each one and I did'nt find
nothing of error.This error is happening because de size of NT to be 20 GB?
What I have to do for resolve this error?
Thanks
Daniel Sousa
*****************************************************************
* Daniel Xavier de Sousa *
* Mestrando em Informática - PUC-Rio *
* E-MAIL : dsousaARROBAinf.puc-rio.br *
* Fone : +55 21 35271500 - 4543
*
****************************************************************
__________________________________________________
Fale com seus amigos de graça com o novo Yahoo! Messenger
http://br.messenger.yahoo.com/
-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
Mpiblast-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/mpiblast-users
-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
Mpiblast-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/mpiblast-users