Hi,
    I have been experimenting with SOCI and have made some observations about 
performance.
I have a test query which is a simple equi-join on two tables returning 2.8 
million rows.

In SQLPlus the performance as measured by /usr/bin/time -p is:
real 178.07
user 52.02
sys 35.62

Using DTL (which is doing full object relational mapping and you might think 
should be quite slow) the result is:
real 97.58
user 74.55
sys 5.12

I think this just shows the quality of Oracle code ;).
Though I was very pleasantly surprised by DTLs efficiency here.
I think this also gives the lie to claims that ODBC is slow.

Using JDBC in java (using BufferedWriter rather than System.out) the result is:
real 159.97
user 27.07
sys 13.07

In SOCI (using the oracle backend) I am following the canonical idiom given in 
http://soci.sourceforge.net/doc/statements.html#bulk
(but for a select rather than an insert)

// Example 3.
const int BATCH_SIZE = 25;
std::vector<int> valsIn;
for (int i = 0; i != BATCH_SIZE; ++i)
{
    ids.push_back(i);
}

statement st = (sql.prepare <<
                "insert into numbers(value) values(:val)",
                use(valsIn));
for (int i = 0; i != 4; ++i)
{
    st.execute(true);
}
Note: I am actually retrieving three separate columns in my query and using 3 
separate vectors.
I have the folllowing results:

BATCH_SIZE = 1
real 610.07
user 86.87
sys 59.78

BATCH_SIZE = 2
real 584.22
user 68.11
sys 56.78

BATCH_SIZE = 4
real 339.86
user 44.08
sys 29.85

BATCH_SIZE = 8
real 191.16
user 27.34
sys 15.16

BATCH_SIZE=50
real 56.25
user 12.47
sys 2.67

BATCH_SIZE=500
real 33.01
user 11.92
sys 1.96

BATCH_SIZE=1000
real 30.11
user 11.69
sys 1.95

BATCH_SIZE=5000
real 27.74
user 11.42
sys 1.83

So clearly the bulk idiom recommended in the documentation is the best one in 
soci.
However its not the nicest to code compared to the more basic idiom:

int i;
statement st = (sql.prepare <<
                "select value from numbers order by value",
                into(i));
st.execute();
while (st.fetch())
{
    cout << i << '\n';
}
Unfortunately that idiom gives performance comparible to BATCH_SIZE=1
real 586.36
user 68.76
sys 56.17

All these results are highly repeatable.
Is there anything that can be done to achieve the performance of the builk 
idiom 
with the 

basic but prettier code
(i.e. not having to use a vector)?
I am thinking perhaps, that the default should be for queries to be buffered 
with unbuffered queries 

being available in a separate class for those that really need them (if anyone).

Regards,

Bruce.

------------------------------------------------------------------------------
All of the data generated in your IT infrastructure is seriously valuable.
Why? It contains a definitive record of application performance, security 
threats, fraudulent activity, and more. Splunk takes this data and makes 
sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-d2d-c2
_______________________________________________
Soci-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/soci-users

Reply via email to