Hi list,
I've looked at maxdb, and it looks pretty sexy ;) I am however not
getting the performance I expected.
I'm researching different database-engines for a project involving a
database with several large tables. Some of the tables will contain
billions of records. As a start, I've exported a table containing 135
million records from a MS SQL server. I used the script below to import
the data to maxdb with loadercli:
---- start script
create table Testtable (
Id FIXED(18) NOT NULL,
CountryId INT DEFAULT 47 NOT NULL,
SiteId INT NOT NULL,
Dato timestamp NOT NULL,
Value INT NOT NULL,
Kvalitet VARCHAR(10) ASCII NULL
)
//
FASTLOAD TABLE Testtable
Id 1
SiteId 2
Dato 3
Value 4
Kvalitet 5
INSTREAM '/home/jgaa/data/Testtable' COMPRESSED
TIMESTAMP ISO
SEPARATOR '\t'
START 2 1000000000
//
CREATE INDEX ix_1 on Testtable (CountryId, SiteId, Dato)
//
CREATE INDEX ix_2 on Testtable (Dato, CountryId, SiteId)
//
CREATE INDEX ix_3 on Testtable (CountryId, SiteId,Value)
---- end script
I timed the loadercli command, and got this result:
real 1392m55.301s
user 27m10.356s
sys 0m43.983s
With 135 million records, that's just above 1600 records/sec. I noticed
that the actual load was very fast, while the indexes took forever to
build.
The test-machine is a P4 3GHz machine with 1G RAM and a 320GB ATA disk
running Debian GNU/Linux "testing" and Linux kernel 2.6.8 with support
for hyperthreading. The database has 4 x 500 GB raw devices + 1 60 GB
raw device for the log (maxdb refused to accept one 200 GB raw device).
The Maxdb kernel is version 7.5.0 (Debian build in "testing").
A few other strange things I've noticed;
1) "SELECT COUNT(*) from Testtable" seems to cause a tablescan (which
is _slow_), and when it runs, the sql studio application hangs until the
connection times out.
2) I tried to add a key-column with a "DEFAULT SERIAL" to get a unique
ID in a visible column (the original data contains duplicate "Id"
values), but FASTLOAD complained about null value.
3) I started and aborted the import a few times. It seems like DROP
TABLE does not release the used space in the volumes.
My main concern at the moment is however I can tune the server for
better performance. I don't plan to use a desktop PC for the real
database, but I'm not comfortable with just 1600 rows/sec on a P4
machine.
Jarle
--
Jarle Aase email: [EMAIL PROTECTED]
Author of freeware. http://www.jgaa.com
news:alt.comp.jgaa
War FTP Daemon: http://www.warftp.org
War FTP Daemon FAQ: http://www.warftp.org/faq/warfaq.htm
Jgaa's PGP key: http://war.jgaa.com/pgp
NB: If you reply to this message, please include all relevant
information from the conversation in your reply. Thanks.
<<< no need to argue - just kill'em all! >>>
--
MaxDB Discussion Mailing List
For list archives: http://lists.mysql.com/maxdb
To unsubscribe: http://lists.mysql.com/[EMAIL PROTECTED]