subject:"\[sqlalchemy\] insert performance vs executemany and max_allowed

Re: [sqlalchemy] insert performance vs executemany and max_allowed_packet limit

2017-05-30 Thread mike bayer




On 05/29/2017 11:34 PM, Михаил Доронин wrote:

Umm, what I've meant is how to use postgresql

on_conflict_do_update in such a way that sqlalchemy would use executemany 
behind the scenes.

In examples it usage looks like this.

stmt = insert(table, values)
stmt = stmt.on_conflict_do_update(set_=dict(a=stmt.excluded.a))

excluded is generated from values, right? If I don't pass values to the 
statement, how can I use excluded?



you would not use insert(table, values), which as we reviewed earlier is 
not "executemany" syntax.   the values are passed to execute() as the 
second argument, and is the list of values which are invoked for the 
statement one at a time.   .excluded is a server side collection 
generated by Postgresql and is based on the current row being operated 
upon.   These values are not returned to the client.  They are only used 
in context of the statement.


think of executemany like this:


stmt = 
table.insert().on_conflict_do_update(set_=dict(a=stmt.excluded.a)).values(x 
= bindparam('x'), y=bindparam('y'))


def executemany(stmt, values):
for value in values:
conn.execute(stmt, value)


executemany(stmt, [{"x": 1, "y": 2}, {"x": 3, "y": 4}, ...])


e.g. if your statement works for one execute() and one set of 
parameters, it will work for any number of individual sets of parameters.



--
SQLAlchemy - 
The Python SQL Toolkit and Object Relational Mapper


http://www.sqlalchemy.org/

To post example code, please provide an MCVE: Minimal, Complete, and Verifiable 
Example.  See  http://stackoverflow.com/help/mcve for a full description.
--- 
You received this message because you are subscribed to the Google Groups "sqlalchemy" group.

To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at https://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.

Re: [sqlalchemy] insert performance vs executemany and max_allowed_packet limit

2017-05-29 Thread Михаил Доронин

Umm, what I've meant is how to use postgresql

on_conflict_do_update in such a way that sqlalchemy would use executemany 
behind the scenes.

In examples it usage looks like this.

stmt = insert(table, values)
stmt = stmt.on_conflict_do_update(set_=dict(a=stmt.excluded.a))

excluded is generated from values, right? If I don't pass values to the 
statement, how can I use excluded?

-- 
SQLAlchemy - 
The Python SQL Toolkit and Object Relational Mapper

http://www.sqlalchemy.org/

To post example code, please provide an MCVE: Minimal, Complete, and Verifiable 
Example.  See  http://stackoverflow.com/help/mcve for a full description.
--- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at https://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.

Re: [sqlalchemy] insert performance vs executemany and max_allowed_packet limit

2017-05-29 Thread mike bayer

On 05/29/2017 08:54 AM, Михаил Доронин wrote:
I have a questions. If the only way to issue executemany is to do a 
conn.execute(table.insert(), list_of_dict). How users supposed to use 
postgres on_conflict and mysql on duplicate key update with this? 

Seems
like there would be no way to access .updated in postgres and 
corresponding .values in mysql, because there are no values have been 
passed at this point.

I'm assuming you're asking how to pass values to things like the WHERE 
clause.   You use bindparam().  See the examples for UPDATE in the Core 
Tutorial at http://docs.sqlalchemy.org/en/latest/core/tutorial.html.

On Thursday, 25 May 2017 22:43:55 UTC+3, Mike Bayer wrote:

On 05/25/2017 01:44 PM, Михаил Доронин wrote:
 >> SQLAlchemy batches inserts in the ORM as is possible and with
Core you
 >> do this explicitly, both make use of cursor.executemany() which
is then
 >> determined by how the DBAPI handles it.
 > 

--
SQLAlchemy -
The Python SQL Toolkit and Object Relational Mapper

http://www.sqlalchemy.org/

To post example code, please provide an MCVE: Minimal, Complete, and 
Verifiable Example. See http://stackoverflow.com/help/mcve for a full 
description.

---
You received this message because you are subscribed to the Google 
Groups "sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send 
an email to sqlalchemy+unsubscr...@googlegroups.com 
.
To post to this group, send email to sqlalchemy@googlegroups.com 
.

Visit this group at https://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.

--
SQLAlchemy - 
The Python SQL Toolkit and Object Relational Mapper

http://www.sqlalchemy.org/

To post example code, please provide an MCVE: Minimal, Complete, and Verifiable 
Example.  See  http://stackoverflow.com/help/mcve for a full description.
--- 
You received this message because you are subscribed to the Google Groups "sqlalchemy" group.

To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at https://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.

Re: [sqlalchemy] insert performance vs executemany and max_allowed_packet limit

2017-05-29 Thread Михаил Доронин

I have a questions. If the only way to issue executemany is to do a 
conn.execute(table.insert(), list_of_dict). How users supposed to use 
postgres on_conflict and mysql on duplicate key update with this? Seems 
like there would be no way to access .updated in postgres and corresponding 
.values in mysql, because there are no values have been passed at this 
point.

On Thursday, 25 May 2017 22:43:55 UTC+3, Mike Bayer wrote:
>
>
>
> On 05/25/2017 01:44 PM, Михаил Доронин wrote: 
> >> SQLAlchemy batches inserts in the ORM as is possible and with Core you 
> >> do this explicitly, both make use of cursor.executemany() which is then 
> >> determined by how the DBAPI handles it. 
> > 

-- 
SQLAlchemy - 
The Python SQL Toolkit and Object Relational Mapper

http://www.sqlalchemy.org/

To post example code, please provide an MCVE: Minimal, Complete, and Verifiable 
Example.  See  http://stackoverflow.com/help/mcve for a full description.
--- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at https://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.

Re: [sqlalchemy] insert performance vs executemany and max_allowed_packet limit

2017-05-29 Thread Михаил Доронин

I have a questions. If the only way to issue executemany is to do a 
conn.execute(table.insert(), list_of_dict). How users supposed to use 
postgres on_conflict and mysql on duplicate key update with this? Seems 
like there would be no way to provide access .updated in postgres and 
corresponding .values in mysql, because there is no values have been passed 
at this point.

On Thursday, 25 May 2017 22:43:55 UTC+3, Mike Bayer wrote:
>
>
>
> On 05/25/2017 01:44 PM, Михаил Доронин wrote: 
> >> SQLAlchemy batches inserts in the ORM as is possible and with Core you 
> >> do this explicitly, both make use of cursor.executemany() which is then 
> >> determined by how the DBAPI handles it. 
> > 
>

-- 
SQLAlchemy - 
The Python SQL Toolkit and Object Relational Mapper

http://www.sqlalchemy.org/

To post example code, please provide an MCVE: Minimal, Complete, and Verifiable 
Example.  See  http://stackoverflow.com/help/mcve for a full description.
--- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at https://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.

Re: [sqlalchemy] insert performance vs executemany and max_allowed_packet limit

2017-05-25 Thread Михаил Доронин




Wow, I've never saw this thing about multiple statements. Thanks! RTFM 

On Thursday, 25 May 2017 22:43:55 UTC+3, Mike Bayer wrote:
>
>
>
> On 05/25/2017 01:44 PM, Михаил Доронин wrote:
> >> SQLAlchemy batches inserts in the ORM as is possible and with Core you 
> >> do this explicitly, both make use of cursor.executemany() which is then 
> >> determined by how the DBAPI handles it. 
> > 
> > 
> > Ummm. If that is true, why this line is used when I pass a list of 
> > values (its from visit_insert)? Clearly sqlalchemy creates big string 
> > `VALUES (%s, %s), (%s, %s)` by itself without executemany. 
> > 
> > 
> > 
> >  elif insert_stmt._has_multi_parameters: 
> >  text += " VALUES %s" % ( 
> >  ", ".join( 
> >  "(%s)" % ( 
> >  ', '.join(c[1] for c in crud_param_set) 
> >  ) 
> >  for crud_param_set in crud_params 
> >  ) 
> >  ) 
> > 
> > 
> > I've written benchmark again. 
> > 
> > https://gist.github.com/purpleP/35c4920020ad95585419eb1d250200b7 
> > 
> > TLDR; alchemy is more than three times slower than executemany for 
> > inserting 1 values. 
>
> Hello - 
>
> I gave you links to detailed documentation that represents the many 
> years I have spent and developing documenting the performance 
> characteristics of SQLAlchemy. 
>
> The test you have posted here, along with your conclusions, does not 
> take this guidance into account nor does it indicate awareness of basic 
> API use as documented in the Core tutorial at: 
>
>
> http://docs.sqlalchemy.org/en/latest/core/tutorial.html#executing-multiple-statements
>  
>
> Here are the links again, which refer to the performance characteristics 
> in detail as well as illustrate how to make use of executemany(), along 
> with benchmarks comparing different API options: 
>
>
> http://docs.sqlalchemy.org/en/latest/faq/performance.html#i-m-inserting-400-000-rows-with-the-orm-and-it-s-really-slow
>  
>
>
> http://docs.sqlalchemy.org/en/latest/_modules/examples/performance/bulk_inserts.html.
>  
>
>
> SQLAlchemy's use of executemany() is a top-level, well documented 
> feature.  It is not an obscure, hidden side effect.   Years of effort 
> have gone into SQLAlchemy's documentation to ensure that behaviors like 
> these will not be mysterious. 
>
> Let's apply it to your script at: 
>
>
> https://gist.github.com/purpleP/35c4920020ad95585419eb1d250200b7#gistcomment-2106299
>  
>
> for 10K results: 
>
> --
>  
>
> benchmark: 3 tests 
> --
>  
>
> Name (time in ms)   Min Max 
>Mean StdDev  MedianIQR 
>  Outliers(*)  Rounds  Iterations 
> 
>  
>
> test_insert[raw]   141.2799 (1.0)  326.9598 (1.0) 
>182.5684 (1.0)  80.9658 (1.55) 144.1729 (1.0)  57.0499 
> (1.16) 1;1   5   1 
> test_insert[alcehmy_correct]   179.7159 (1.27) 340.2121 (1.04) 
>224.5844 (1.23) 68.2835 (1.30) 188.8490 (1.31) 79.4864 
> (1.61) 1;0   5   1 
> test_insert[alchemy_incorrect] 495.6210 (3.51) 636.7581 (1.95) 
>550.5236 (3.02) 52.3564 (1.0)  537.3681 (3.73) 49.3491 
> (1.0)  2;0   5   1 
> 
>  
>
>
>
> for 100K rows (no OperationalError is raised): 
>
> - 
>
> benchmark: 3 tests 
> - 
>
> Name (time in s)  Min   Max 
> MeanStdDevMedian   IQR 
> Outliers(*)  Rounds  Iterations 
> --
>  
>
> test_insert[raw]   1.3214 (1.0)  1.4007 (1.0) 
> 1.3660 (1.0)  0.0307 (1.94) 1.3709 (1.0)  0.0448 (1.95) 
> 2;0   5   1 
> test_insert[alcehmy_correct]   1.7432 (1.32) 1.7848 (1.27) 
> 1.7645 (1.29) 0.0159 (1.0)  1.7655 (1.29) 0.0229 (1.0) 
> 2;0   5   1 
> test_insert[alchemy_incorrect] 5.7412 (4.34) 5.9293 (4.23) 
> 5.8330 (4.27) 0.0687

Re: [sqlalchemy] insert performance vs executemany and max_allowed_packet limit

2017-05-25 Thread mike bayer




On 05/25/2017 01:44 PM, Михаил Доронин wrote:
SQLAlchemy batches inserts in the ORM as is possible and with Core you 
do this explicitly, both make use of cursor.executemany() which is then 
determined by how the DBAPI handles it.



Ummm. If that is true, why this line is used when I pass a list of 
values (its from visit_insert)? Clearly sqlalchemy creates big string 
`VALUES (%s, %s), (%s, %s)` by itself without executemany.




 elif insert_stmt._has_multi_parameters:
 text += " VALUES %s" % (
 ", ".join(
 "(%s)" % (
 ', '.join(c[1] for c in crud_param_set)
 )
 for crud_param_set in crud_params
 )
 )


I've written benchmark again.

https://gist.github.com/purpleP/35c4920020ad95585419eb1d250200b7

TLDR; alchemy is more than three times slower than executemany for 
inserting 1 values.


Hello -

I gave you links to detailed documentation that represents the many 
years I have spent and developing documenting the performance 
characteristics of SQLAlchemy.


The test you have posted here, along with your conclusions, does not 
take this guidance into account nor does it indicate awareness of basic 
API use as documented in the Core tutorial at:


http://docs.sqlalchemy.org/en/latest/core/tutorial.html#executing-multiple-statements

Here are the links again, which refer to the performance characteristics 
in detail as well as illustrate how to make use of executemany(), along 
with benchmarks comparing different API options:


http://docs.sqlalchemy.org/en/latest/faq/performance.html#i-m-inserting-400-000-rows-with-the-orm-and-it-s-really-slow

http://docs.sqlalchemy.org/en/latest/_modules/examples/performance/bulk_inserts.html.

SQLAlchemy's use of executemany() is a top-level, well documented 
feature.  It is not an obscure, hidden side effect.   Years of effort 
have gone into SQLAlchemy's documentation to ensure that behaviors like 
these will not be mysterious.


Let's apply it to your script at:

https://gist.github.com/purpleP/35c4920020ad95585419eb1d250200b7#gistcomment-2106299

for 10K results:

-- 
benchmark: 3 tests 
--
Name (time in ms)   Min Max 
  Mean StdDev  MedianIQR 
Outliers(*)  Rounds  Iterations


test_insert[raw]   141.2799 (1.0)  326.9598 (1.0) 
  182.5684 (1.0)  80.9658 (1.55) 144.1729 (1.0)  57.0499 
(1.16) 1;1   5   1
test_insert[alcehmy_correct]   179.7159 (1.27) 340.2121 (1.04) 
  224.5844 (1.23) 68.2835 (1.30) 188.8490 (1.31) 79.4864 
(1.61) 1;0   5   1
test_insert[alchemy_incorrect] 495.6210 (3.51) 636.7581 (1.95) 
  550.5236 (3.02) 52.3564 (1.0)  537.3681 (3.73) 49.3491 
(1.0)  2;0   5   1




for 100K rows (no OperationalError is raised):

- 
benchmark: 3 tests 
-
Name (time in s)  Min   Max 
MeanStdDevMedian   IQR 
Outliers(*)  Rounds  Iterations

--
test_insert[raw]   1.3214 (1.0)  1.4007 (1.0) 
1.3660 (1.0)  0.0307 (1.94) 1.3709 (1.0)  0.0448 (1.95) 
   2;0   5   1
test_insert[alcehmy_correct]   1.7432 (1.32) 1.7848 (1.27) 
1.7645 (1.29) 0.0159 (1.0)  1.7655 (1.29) 0.0229 (1.0) 
   2;0   5   1
test_insert[alchemy_incorrect] 5.7412 (4.34) 5.9293 (4.23) 
5.8330 (4.27) 0.0687 (4.33) 5.8319 (4.25) 0.0831 (3.63) 
   2;0   5   1

--

SQLAlchemy Core is therefore about 1.2 times slower than pure C 
mysqlclient when used as documented.  Which is actually a little less 
than the difference against SQLite's driver vs. Core, that is described 
at

Re: [sqlalchemy] insert performance vs executemany and max_allowed_packet limit

2017-05-25 Thread Михаил Доронин


For Christ sake how to post a new message here and not just reply? And how 
to edit the old post? Arg!

-- 
SQLAlchemy - 
The Python SQL Toolkit and Object Relational Mapper

http://www.sqlalchemy.org/

To post example code, please provide an MCVE: Minimal, Complete, and Verifiable 
Example.  See  http://stackoverflow.com/help/mcve for a full description.
--- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at https://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.

Re: [sqlalchemy] insert performance vs executemany and max_allowed_packet limit

2017-05-25 Thread Михаил Доронин



> SQLAlchemy batches inserts in the ORM as is possible and with Core you 
> do this explicitly, both make use of cursor.executemany() which is then 
> determined by how the DBAPI handles it.


Ummm. If that is true, why this line is used when I pass a list of values (its 
from visit_insert)? Clearly sqlalchemy creates big string `VALUES (%s, %s), 
(%s, %s)` by itself without executemany.



elif insert_stmt._has_multi_parameters:text += " VALUES %s" 
% (", ".join("(%s)" % ( 
   ', '.join(c[1] for c in crud_param_set)) 
   for crud_param_set in crud_params))


I've written benchmark again.

https://gist.github.com/purpleP/35c4920020ad95585419eb1d250200b7

TLDR; alchemy is more than three times slower than executemany for inserting 
1 values.

for a million values sqlalchemy drops with OperationalError.


Here's a vmprof visualization for 1 values 
http://vmprof.com/#/c7f9a4fd-55f9-469d-b6e4-b7cac04473fa .

vmprof is much better than cProfile give it a try if you still coding in python.


If you will need more data then just a visualization - I'll try to provide it. 
But you have the benchmark code in the gist.

-- 
SQLAlchemy - 
The Python SQL Toolkit and Object Relational Mapper

http://www.sqlalchemy.org/

To post example code, please provide an MCVE: Minimal, Complete, and Verifiable 
Example.  See  http://stackoverflow.com/help/mcve for a full description.
--- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at https://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.

Re: [sqlalchemy] insert performance vs executemany and max_allowed_packet limit

2017-05-25 Thread mike bayer




On 05/25/2017 12:02 PM, Михаил Доронин wrote:
Sorry I can't post a benchmark here, because I'm not in the office. 
Maybe I will do that tomorrow. I use vmprof and can just share a link to 
the uploaded profile. Will that suite you?


I would much prefer if you can send a traditional cProfile file as I 
don't have the time to learn how to use vmprof.





But I can say right now that I've passed only integers, not strings. 
Also mysqlclient encoding values if needed as far as I can see.


https://github.com/PyMySQL/mysqlclient-python/blob/master/MySQLdb/cursors.py 
see _do_execute_many at line 286.


And the fact that sqlalchemy doesn't make batches automatically is much 
worse than performance problems.


SQLAlchemy batches inserts in the ORM as is possible and with Core you 
do this explicitly, both make use of cursor.executemany() which is then 
determined by how the DBAPI handles it.


 Heuristics will not work one day, which

means any inserts to the database should be surrounded with try catch etc.
sqlalchemy makes sql by ', '.join on values, which makes it impossible 
for it to see that the string size is to big until it's too late. 


I don't know what "the string size is too big" refers to, but also this 
sounds like a topic other than performance.   if the ",".join() refers 
to how an INSERT is composed, that has nothing to do with the data 
you're passing in the insert, it is just building bound parameter 
placeholders.



I
think it's better to use approach that mysqlclient is using, where they 
encode prefix and postfix query part into bytearray (which is mutable) 
and append arguments to it checking that the bytearray size is less then 
the limit.


Sorry, I'm not familiar with what this is referring towards.





--
SQLAlchemy -
The Python SQL Toolkit and Object Relational Mapper

http://www.sqlalchemy.org/

To post example code, please provide an MCVE: Minimal, Complete, and 
Verifiable Example. See http://stackoverflow.com/help/mcve for a full 
description.

---
You received this message because you are subscribed to the Google 
Groups "sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send 
an email to sqlalchemy+unsubscr...@googlegroups.com 
.
To post to this group, send email to sqlalchemy@googlegroups.com 
.

Visit this group at https://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.


--
SQLAlchemy - 
The Python SQL Toolkit and Object Relational Mapper


http://www.sqlalchemy.org/

To post example code, please provide an MCVE: Minimal, Complete, and Verifiable 
Example.  See  http://stackoverflow.com/help/mcve for a full description.
--- 
You received this message because you are subscribed to the Google Groups "sqlalchemy" group.

To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at https://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.

Re: [sqlalchemy] insert performance vs executemany and max_allowed_packet limit

2017-05-25 Thread Михаил Доронин

Sorry I can't post a benchmark here, because I'm not in the office. Maybe I 
will do that tomorrow. I use vmprof and can just share a link to the 
uploaded profile. Will that suite you?

But I can say right now that I've passed only integers, not strings. Also 
mysqlclient encoding values if needed as far as I can see.

https://github.com/PyMySQL/mysqlclient-python/blob/master/MySQLdb/cursors.py 
see _do_execute_many at line 286.

And the fact that sqlalchemy doesn't make batches automatically is much 
worse than performance problems. Heuristics will not work one day, which 
means any inserts to the database should be surrounded with try catch etc.
sqlalchemy makes sql by ', '.join on values, which makes it impossible for 
it to see that the string size is to big until it's too late. I think it's 
better to use approach that mysqlclient is using, where they encode prefix 
and postfix query part into bytearray (which is mutable) and append 
arguments to it checking that the bytearray size is less then the limit.

-- 
SQLAlchemy - 
The Python SQL Toolkit and Object Relational Mapper

http://www.sqlalchemy.org/

To post example code, please provide an MCVE: Minimal, Complete, and Verifiable 
Example.  See  http://stackoverflow.com/help/mcve for a full description.
--- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at https://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.

Re: [sqlalchemy] insert performance vs executemany and max_allowed_packet limit

2017-05-25 Thread mike bayer

the topic of performance is one that many years of effort have gone 
into.  For background on this topic particularly inserts, see:


http://docs.sqlalchemy.org/en/latest/faq/performance.html#i-m-inserting-400-000-rows-with-the-orm-and-it-s-really-slow

for a detailed test suite that will illustrate the many varieties of 
INSERT, see 
http://docs.sqlalchemy.org/en/latest/_modules/examples/performance/bulk_inserts.html.


as far as bound parameter slowness, this is a known condition which can 
arise if you are for example passing a lot of unicode strings in and the 
database driver and/or SQLAlchemy is spending lots of time encoding them 
into a string encoding like utf-8 or similar.There's not really any 
other bound parameter process that is known to take much time, other 
than strings, pretty much all bound parameters are straight 
pass-throughs to the DBAPI.   The difference is often that the straight 
DBAPI application doesn't set up unicode encoding whereas the SQLAlchemy 
version does (and can of course be changed).



If you can share a Python profile as documented at 
http://docs.sqlalchemy.org/en/latest/faq/performance.html#code-profiling 
I can show you how to remove the bound parameter overhead.




On 05/25/2017 05:49 AM, Михаил Доронин wrote:

I've tried to benchmark alchemy performance when inserting a lot of data.
The results wasn't that good for sqlalchemy. The difference was up to 
three times in median values.


First of all the more elements inserted the more the difference between 
sqlalchemy and executemany (mysqlclient).
I've profiled the code - most of the time spent in visit_bind_param and 
BindParam initializer. I've skimmed over the code and no places for 
optimization are obvious, however it seems like the logic is too much 
compilcated. There's a lot of conditions etc. Maybe this can be 
simplified in some way or maybe there could be a parameter in the insert 
that user can use to say that he don't want any complex logic, he just 
inserting some data and he takes the responsibility that the data is 
correct.


Next thing is that in executemany they keep an eye on the size of the 
string to be executed and if it's more than max_allowed_packet limit 
they split it into batches (they hardcoded this limit though instead of 
taking it from database at runtime).
Not only sqlalchemy isn't doing that - it doesn't provide a way to know 
what the size of a string would be. And the only thing the user can do 
is too catch exception and use heuristics to split the data.


--
SQLAlchemy -
The Python SQL Toolkit and Object Relational Mapper

http://www.sqlalchemy.org/

To post example code, please provide an MCVE: Minimal, Complete, and 
Verifiable Example. See http://stackoverflow.com/help/mcve for a full 
description.

---
You received this message because you are subscribed to the Google 
Groups "sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send 
an email to sqlalchemy+unsubscr...@googlegroups.com 
.
To post to this group, send email to sqlalchemy@googlegroups.com 
.

Visit this group at https://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.


--
SQLAlchemy - 
The Python SQL Toolkit and Object Relational Mapper


http://www.sqlalchemy.org/

To post example code, please provide an MCVE: Minimal, Complete, and Verifiable 
Example.  See  http://stackoverflow.com/help/mcve for a full description.
--- 
You received this message because you are subscribed to the Google Groups "sqlalchemy" group.

To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at https://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.

[sqlalchemy] insert performance vs executemany and max_allowed_packet limit

2017-05-25 Thread Михаил Доронин

I've tried to benchmark alchemy performance when inserting a lot of data.
The results wasn't that good for sqlalchemy. The difference was up to three 
times in median values.

First of all the more elements inserted the more the difference between 
sqlalchemy and executemany (mysqlclient).
I've profiled the code - most of the time spent in visit_bind_param and 
BindParam initializer. I've skimmed over the code and no places for 
optimization are obvious, however it seems like the logic is too much 
compilcated. There's a lot of conditions etc. Maybe this can be simplified 
in some way or maybe there could be a parameter in the insert that user can 
use to say that he don't want any complex logic, he just inserting some 
data and he takes the responsibility that the data is correct.

Next thing is that in executemany they keep an eye on the size of the 
string to be executed and if it's more than max_allowed_packet limit they 
split it into batches (they hardcoded this limit though instead of taking 
it from database at runtime).
Not only sqlalchemy isn't doing that - it doesn't provide a way to know 
what the size of a string would be. And the only thing the user can do is 
too catch exception and use heuristics to split the data.

-- 
SQLAlchemy - 
The Python SQL Toolkit and Object Relational Mapper

http://www.sqlalchemy.org/

To post example code, please provide an MCVE: Minimal, Complete, and Verifiable 
Example.  See  http://stackoverflow.com/help/mcve for a full description.
--- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at https://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.

Re: [sqlalchemy] insert performance vs executemany and max_allowed_packet limit

Re: [sqlalchemy] insert performance vs executemany and max_allowed_packet limit

Re: [sqlalchemy] insert performance vs executemany and max_allowed_packet limit

Re: [sqlalchemy] insert performance vs executemany and max_allowed_packet limit

Re: [sqlalchemy] insert performance vs executemany and max_allowed_packet limit

Re: [sqlalchemy] insert performance vs executemany and max_allowed_packet limit

Re: [sqlalchemy] insert performance vs executemany and max_allowed_packet limit

Re: [sqlalchemy] insert performance vs executemany and max_allowed_packet limit

Re: [sqlalchemy] insert performance vs executemany and max_allowed_packet limit

Re: [sqlalchemy] insert performance vs executemany and max_allowed_packet limit

Re: [sqlalchemy] insert performance vs executemany and max_allowed_packet limit

Re: [sqlalchemy] insert performance vs executemany and max_allowed_packet limit

[sqlalchemy] insert performance vs executemany and max_allowed_packet limit

13 matches

Site Navigation

Mail list logo

Footer information