We saw similar problems of inconsistent behaviour with external loaders to
databases. Most of our experience was with Oracle, but our findings should
hold.

1. It is best to load a freshly created table and join it to the other
tables, or primary table, preferably through views.
2. If you have any indexes or primary keys on the table being loaded, drop
them prior to the load.
3. If you have any logs running, either stop them or lower their logging
level to the minimum.
4. If possible, sort the data on the primary key field before the load.
5. If possible, split the text file into smaller files if you are uploading
selectively. 'sed' worked great for me.
6. Parallelize (sic) the load. This sounds simple - define table space over
multiple physical disks and load with a separate process working with a
cleanly split data source into each tablespace - but it requires a lot of
tuning. We initially experienced numerous load failures because of I/O
conflicts.

We were loading our DW with about 120 million records per run, and we were
successful in dropping our load times from 3 days to 4 hours!

Please could you let me know what you try and what the results are? We're
looking into using MaxDB ourselves.

Thanks.

Samir.


-----Original Message-----
From: Arsen Pereymer [mailto:[EMAIL PROTECTED]
Sent: Thursday, May 20, 2004 19:16
To: [EMAIL PROTECTED]
Subject: MaxDB performance versus Microsoft SQL performance


Hello,


Currently, we have a large Microsoft SQL 2000 Database containing over 100
tables with over 100+ million rows. The purpose of the database is to house
a datawarehouse. It currently takes about 20hrs to run the ETL to populate
the database.

We have been doing research and found that using MaxDB the performance can
greatly increase. Thus we started a benchmark test . We moved 5 tables to
MaxDB each containing approximately 800k rows. We re-wrote the SQL script
mentioned below into MaxDB. 

INSERT INTO "DBA"."tbBenchmarkData"
SELECT 
    D."DistrictID",
    0, 
    0, 
    BM."gradeID", 
    BM."subjectID", 
    SD."StudentID", 
    T."BMSEQNUM", 
    SD."ObjectiveID", 
    1 as ObjectiveTypeID, 
    avg(SD."GotRight")*100, 
    count(SD."QW_ID") 
FROM 
    "DBA"."tbTests " T, 
    "DBA"."tbStudentData" SD,
    "DBA"."tbUserOrg" UO,
    "DBA"."tbDistricts" D, 
    "DBA"."tbBMTestinfo" BM
Where 
    SD."TaskID" = T."TaskID" 
    AND
    UO."UserID" = SD."StudentID" 
    AND
    D."DistrictID" =UO."OrgID"
    AND
    BM."SEQNUM" =T."BMSEQNUM"
    AND
T."BMSEQNUM" >0 
GROUP BY 
    D."DistrictID", 
    SD."StudentID", 
    BM."subjectID", 
    BM."gradeID", 
    T."BMSEQNUM", 
    SD."ObjectiveID" ;


When running the below script on Microsoft SQL server it takes approximately
5 minutes to insert 1.9 million rows. When running the same script on MaxDB
takes over 1hr to insert the same amount of rows.

Why is the MaxDB performance so slow?

Thank you very much in advanced for any help!





-- 
MaxDB Discussion Mailing List
For list archives: http://lists.mysql.com/maxdb
To unsubscribe:
http://lists.mysql.com/[EMAIL PROTECTED]

-- 
MaxDB Discussion Mailing List
For list archives: http://lists.mysql.com/maxdb
To unsubscribe:    http://lists.mysql.com/[EMAIL PROTECTED]

Reply via email to