[jira] [Commented] (CASSANDRA-13904) Performance improvement of Cassandra UDF/UDA

xin jin (JIRA) Tue, 26 Sep 2017 13:32:43 -0700

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-13904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16181515#comment-16181515
 ]


xin jin commented on CASSANDRA-13904:
-------------------------------------

Simple experiments:

{code}
//Test function:
createTable("CREATE TABLE %s (a int primary key, b int)");
        List<String> queryList = new ArrayList<>();
        for (int i = 1, m = 10000; i < m; i++) {
            String queryString = "INSERT INTO %s (a, b) " + 
String.format("VALUES (%d, %d)", i, i);
            execute(queryString);
        }
        String fState = createFunction(KEYSPACE,
                                       "int, int",
                                       "CREATE FUNCTION %s(a int, b int) " +
                                       "CALLED ON NULL INPUT " +
                                       "RETURNS int " +
                                       "LANGUAGE java " +
                                       "AS 'return 
Integer.valueOf((a!=null?a.intValue():0) + b.intValue());'");
        String a = createAggregate(KEYSPACE,
                                   "int, int",
                                   "CREATE AGGREGATE %s(int) " +
                                   "SFUNC " + shortFunctionName(fState) + " " +
                                   "STYPE int");
        // 1 + 2 + 3 = 6
        assertRows(execute("SELECT " + a + "(b) FROM %s"), row(49995000));
{code}

results:

1. enable_user_defined_functions_threads: false

TRACE: UDAggregate.java:198 - Executed UDA cql_test_keyspace.aggregate_2: 9999 
call(s) to state function cql_test_keyspace.function_1 in 37259μs, 17297μs, 
26131μs

2. enable_user_defined_functions_threads: true

UDAggregate.java:198 - Executed UDA cql_test_keyspace.aggregate_2: 9999 call(s) 
to state function cql_test_keyspace.function_1 in 555004μs, 457931μs, 475664μs


> Performance improvement of Cassandra UDF/UDA
> --------------------------------------------
>
>                 Key: CASSANDRA-13904
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-13904
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: CQL
>            Reporter: xin jin
>            Priority: Critical
>              Labels: performance
>             Fix For: 3.11.x
>
>
> Hi All,
> We have made a few experiments and found that running query with direct UDF 
> execution is ten time more faster than the async UDF execution. The in-line 
> comment: "Using async UDF execution is expensive (adds about 100us overhead 
> per invocation on a Core-i7 MBPr)” 
> https://insight.io/github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/cql3/functions/UDFunction.java?line=293
>  show that this is a known behavior.  My questions are as below:
> 1. What are the main pros and cons of these two methods? Can I find any 
> documents that discuss this?  
> 2. Are there any plans to improve the performance of using async UDF? A 
> simple way come to my mind is to use some sort of batch method, e.g., replace 
> current row by row method with some rows by some rows. Are there any concerns 
> on this?
> 3. How people solve this performance issue in general? It seems this 
> performance issue is not an urgent or an important issue to solve because it 
> is known and it is still there. Therefore people must have some sort of good 
> solution solving this issue. 
> I really appreciate your comments in advance.
> Best regards,
> Xin



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (CASSANDRA-13904) Performance improvement of Cassandra UDF/UDA

Reply via email to