[
https://issues.apache.org/jira/browse/CASSANDRA-13904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16181515#comment-16181515
]
xin jin commented on CASSANDRA-13904:
-------------------------------------
Simple experiments:
{code}
//Test function:
createTable("CREATE TABLE %s (a int primary key, b int)");
List<String> queryList = new ArrayList<>();
for (int i = 1, m = 10000; i < m; i++) {
String queryString = "INSERT INTO %s (a, b) " +
String.format("VALUES (%d, %d)", i, i);
execute(queryString);
}
String fState = createFunction(KEYSPACE,
"int, int",
"CREATE FUNCTION %s(a int, b int) " +
"CALLED ON NULL INPUT " +
"RETURNS int " +
"LANGUAGE java " +
"AS 'return
Integer.valueOf((a!=null?a.intValue():0) + b.intValue());'");
String a = createAggregate(KEYSPACE,
"int, int",
"CREATE AGGREGATE %s(int) " +
"SFUNC " + shortFunctionName(fState) + " " +
"STYPE int");
// 1 + 2 + 3 = 6
assertRows(execute("SELECT " + a + "(b) FROM %s"), row(49995000));
{code}
results:
1. enable_user_defined_functions_threads: false
TRACE: UDAggregate.java:198 - Executed UDA cql_test_keyspace.aggregate_2: 9999
call(s) to state function cql_test_keyspace.function_1 in 37259μs, 17297μs,
26131μs
2. enable_user_defined_functions_threads: true
UDAggregate.java:198 - Executed UDA cql_test_keyspace.aggregate_2: 9999 call(s)
to state function cql_test_keyspace.function_1 in 555004μs, 457931μs, 475664μs
> Performance improvement of Cassandra UDF/UDA
> --------------------------------------------
>
> Key: CASSANDRA-13904
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13904
> Project: Cassandra
> Issue Type: Improvement
> Components: CQL
> Reporter: xin jin
> Priority: Critical
> Labels: performance
> Fix For: 3.11.x
>
>
> Hi All,
> We have made a few experiments and found that running query with direct UDF
> execution is ten time more faster than the async UDF execution. The in-line
> comment: "Using async UDF execution is expensive (adds about 100us overhead
> per invocation on a Core-i7 MBPr)”
> https://insight.io/github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/cql3/functions/UDFunction.java?line=293
> show that this is a known behavior. My questions are as below:
> 1. What are the main pros and cons of these two methods? Can I find any
> documents that discuss this?
> 2. Are there any plans to improve the performance of using async UDF? A
> simple way come to my mind is to use some sort of batch method, e.g., replace
> current row by row method with some rows by some rows. Are there any concerns
> on this?
> 3. How people solve this performance issue in general? It seems this
> performance issue is not an urgent or an important issue to solve because it
> is known and it is still there. Therefore people must have some sort of good
> solution solving this issue.
> I really appreciate your comments in advance.
> Best regards,
> Xin
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]