Hi,
I have been experimenting with SOCI and have made some observations about
performance.
I have a test query which is a simple equi-join on two tables returning 2.8
million rows.
In SQLPlus the performance as measured by /usr/bin/time -p is:
real 178.07
user 52.02
sys 35.62
Using DTL (which is doing full object relational mapping and you might think
should be quite slow) the result is:
real 97.58
user 74.55
sys 5.12
I think this just shows the quality of Oracle code ;).
Though I was very pleasantly surprised by DTLs efficiency here.
I think this also gives the lie to claims that ODBC is slow.
Using JDBC in java (using BufferedWriter rather than System.out) the result is:
real 159.97
user 27.07
sys 13.07
In SOCI (using the oracle backend) I am following the canonical idiom given in
http://soci.sourceforge.net/doc/statements.html#bulk
(but for a select rather than an insert)
// Example 3.
const int BATCH_SIZE = 25;
std::vector<int> valsIn;
for (int i = 0; i != BATCH_SIZE; ++i)
{
ids.push_back(i);
}
statement st = (sql.prepare <<
"insert into numbers(value) values(:val)",
use(valsIn));
for (int i = 0; i != 4; ++i)
{
st.execute(true);
}
Note: I am actually retrieving three separate columns in my query and using 3
separate vectors.
I have the folllowing results:
BATCH_SIZE = 1
real 610.07
user 86.87
sys 59.78
BATCH_SIZE = 2
real 584.22
user 68.11
sys 56.78
BATCH_SIZE = 4
real 339.86
user 44.08
sys 29.85
BATCH_SIZE = 8
real 191.16
user 27.34
sys 15.16
BATCH_SIZE=50
real 56.25
user 12.47
sys 2.67
BATCH_SIZE=500
real 33.01
user 11.92
sys 1.96
BATCH_SIZE=1000
real 30.11
user 11.69
sys 1.95
BATCH_SIZE=5000
real 27.74
user 11.42
sys 1.83
So clearly the bulk idiom recommended in the documentation is the best one in
soci.
However its not the nicest to code compared to the more basic idiom:
int i;
statement st = (sql.prepare <<
"select value from numbers order by value",
into(i));
st.execute();
while (st.fetch())
{
cout << i << '\n';
}
Unfortunately that idiom gives performance comparible to BATCH_SIZE=1
real 586.36
user 68.76
sys 56.17
All these results are highly repeatable.
Is there anything that can be done to achieve the performance of the builk
idiom
with the
basic but prettier code
(i.e. not having to use a vector)?
I am thinking perhaps, that the default should be for queries to be buffered
with unbuffered queries
being available in a separate class for those that really need them (if anyone).
Regards,
Bruce.
------------------------------------------------------------------------------
All of the data generated in your IT infrastructure is seriously valuable.
Why? It contains a definitive record of application performance, security
threats, fraudulent activity, and more. Splunk takes this data and makes
sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-d2d-c2
_______________________________________________
Soci-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/soci-users