[ 
https://issues.apache.org/jira/browse/CASSANDRA-7444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14043163#comment-14043163
 ] 

Sylvain Lebresne commented on CASSANDRA-7444:
---------------------------------------------

Not sure if that's the same than what Brandon mention, but currently 
{{DefsTable.mergeSchemaInternal}} will read the existing schema in it's 
entirety twice (pre and post update), not matter what the actual update is. A 
pretty trivial update would consist in checking which keyspaces are actually 
updated (by gathering the keys of the mutations in parameters) and only reading 
the schema for those keyspaces.

This will obviously only help in the case of multiple keyspaces and won't 
magically make creating crap tons of tables a good idea, but mentioning it as 
it's a very simple change we could start with (note that we could get 
finer-grained than the keyspace to figure out what needs to be read, but it's 
slightly more involved. Doing it for keyspace is really trivial since all 
schema tables use the keyspace name as partition key).

> Performance drops when creating large amount of tables 
> -------------------------------------------------------
>
>                 Key: CASSANDRA-7444
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-7444
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>         Environment: [cqlsh 3.1.8 | Cassandra 1.2.15.1 | CQL spec 3.0.0 | 
> Thrift protocol 19.36.2][cqlsh 4.1.1 | Cassandra 2.0.7.31 | CQL spec 3.1.1 | 
> Thrift protocol 19.39.0]
>            Reporter: Jose Martinez Poblete
>            Priority: Minor
>              Labels: cassandra
>
> We are creating 4000 tables from a script and using cqlsh to create the 
> tables. As the tables are being created, the time taken grows exponentially 
> and it becomes very slow and takes a lot of time.
> We read a file get the keyspace append a random number and then create 
> keyspace with this new name example Airplane_12345678, Airplane_123575849... 
> then fed into cqlsh via script
> Similarly each table is created via script use Airplane_12345678; create 
> table1...table25 , then use Airplane_123575849; create table1...create table25
> It is all done in singleton fashion, doing one after the other in a loop.
> We tested using the following bash script
> {noformat}
> #!/bin/bash
> SEED=0
> ITERATIONS=20
> while [ ${SEED} -lt ${ITERATIONS} ]; do
>    COUNT=0
>    KEYSPACE=t10789_${SEED}
>    echo "CREATE KEYSPACE ${KEYSPACE} WITH replication = { 'class': 
> 'NetworkTopologyStrategy', 'Cassandra': '1' };"  > ${KEYSPACE}.ddl
>    echo "USE ${KEYSPACE};" >> ${KEYSPACE}.ddl
>    while [ ${COUNT} -lt 25 ]; do
>       echo "CREATE TABLE user_colors${COUNT} (user_id int PRIMARY KEY, colors 
> list<ascii> );" >> ${KEYSPACE}.ddl
>       ((COUNT++))
>    done 
>    ((SEED++))
>    time cat ${KEYSPACE}.ddl | cqlsh
>    if [ "$?" -gt 0 ]; then
>       echo "[ERROR] Failure at ${KEYSPACE}"
>       exit 1
>    else
>       echo "[OK]    Created ${KEYSPACE}"
>    fi
>    echo "==============================="
>    sleep 3
> done
> #EOF
> {noformat}
> The timing we got on an otherwise idle system were inconsistent
> {noformat}
> real    0m42.649s
> user    0m0.332s
> sys     0m0.092s
> [OK]    Created t10789_0
> ===============================
> real    1m22.211s
> user    0m0.332s
> sys     0m0.096s
> [OK]    Created t10789_1
> ===============================
> real    2m45.907s
> user    0m0.304s
> sys     0m0.124s
> [OK]    Created t10789_2
> ===============================
> real    3m24.098s
> user    0m0.340s
> sys     0m0.108s
> [OK]    Created t10789_3
> ===============================
> real    2m38.930s
> user    0m0.324s
> sys     0m0.116s
> [OK]    Created t10789_4
> ===============================
> real    3m4.186s
> user    0m0.336s
> sys     0m0.104s
> [OK]    Created t10789_5
> ===============================
> real    2m55.391s
> user    0m0.344s
> sys     0m0.092s
> [OK]    Created t10789_6
> ===============================
> real    2m14.290s
> user    0m0.328s
> sys     0m0.108s
> [OK]    Created t10789_7
> ===============================
> real    2m44.880s
> user    0m0.344s
> sys     0m0.092s
> [OK]    Created t10789_8
> ===============================
> real    1m52.785s
> user    0m0.336s
> sys     0m0.128s
> [OK]    Created t10789_9
> ===============================
> real    1m18.404s
> user    0m0.344s
> sys     0m0.108s
> [OK]    Created t10789_10
> ===============================
> real    2m20.681s
> user    0m0.348s
> sys     0m0.104s
> [OK]    Created t10789_11
> ===============================
> real    1m11.860s
> user    0m0.332s
> sys     0m0.096s
> [OK]    Created t10789_12
> ===============================
> real    1m37.887s
> user    0m0.324s
> sys     0m0.100s
> [OK]    Created t10789_13
> ===============================
> real    1m31.616s
> user    0m0.316s
> sys     0m0.132s
> [OK]    Created t10789_14
> ===============================
> real    1m12.103s
> user    0m0.360s
> sys     0m0.088s
> [OK]    Created t10789_15
> ===============================
> real    0m36.378s
> user    0m0.340s
> sys     0m0.092s
> [OK]    Created t10789_16
> ===============================
> real    0m40.883s
> user    0m0.352s
> sys     0m0.096s
> [OK]    Created t10789_17
> ===============================
> real    0m40.661s
> user    0m0.332s
> sys     0m0.096s
> [OK]    Created t10789_18
> ===============================
> real    0m44.943s
> user    0m0.324s
> sys     0m0.104s
> [OK]    Created t10789_19
> ===============================
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to