[jira] [Commented] (CASSANDRA-9954) Improve Java-UDF timeout detection

Ariel Weisberg (JIRA) Tue, 08 Dec 2015 09:11:19 -0800

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-9954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15047080#comment-15047080
 ]


Ariel Weisberg commented on CASSANDRA-9954:
-------------------------------------------

Thanks Robert. Are we going to do this after CASSANDRA-10395?

I know this isn't part of this issue, but the whitelist and blacklist as 
constants seem a little problematic. Just from a deployment and maintenance 
perspective allowing people to manipulate them (mechanism not policy) as well 
as warning for some things rather then straight up blocking them seems 
appropriate. If one thing we want to let people do is leverage existing code 
inside UDFs then we don't want to be too inflexible. Definitely not something 
to do as part of this, but I am broaching the subject.

Do we allow UDFs in writes? I read the blog post and it seems like you can mark 
the UDFs as deterministic/non-deterministic. Part of paving the path for 
determinism is disallowing currentTimeMillis() and nanoTime(). If they want 
time they should pass them to the UDF as a parameter when the invoke the query. 
The same could be said for random number generation. For deterministic UDFs you 
might be much more strict or have different warning/error policies for calling 
different functions. Doing DNS resolution from a UDF isn't technically wrong if 
they have good caching and timeouts in place (or we provide that for them).

For reads do UDFs only run at the coordinator or remotely at replicas before 
results are returned? I suppose it doesn't really matter since the pain when 
versions or configurations have different whitelist/blacklist settings is the 
same.

Checking metrics every 16 times is a little bit too often for most loop 
iterations. Maybe make that a property? The check is not cheap and represents 
at least a hundred nanoseconds of work possibly more. How often will people 
actually have loops to iterate through in UDFs? I imagine if they tear apart a 
collection or a JSON doc it will be pretty heavyweight stuff.

[This isn't just verifying anymore, it's 
verifyAndInstrument.|https://github.com/apache/cassandra/compare/trunk...snazy:9954-amok-udf-trunk?expand=1#diff-f78d972e8cfe0d78f1c51210dbe681ceR85]

I am not completely familiar with what the compiler does when emitting the 
labels for bytecode. Does it have a convention to insert in a bunch of places? 
Inserting a check at all the labels seems a bit excessive, but it's just 
performance so rather then guess as to how it works let's just measure the 
performance in a meaningful way. Do we have a benchmark workload we could run 
in cstar that would test UDF performance? Maybe one for a lightweight UDF and 
another for the heaviest weight UDF we think we will come across? For the 
lightweight UDF we may want to test an expression that invokes several UDFs per 
query so that it magnifies the transaction cost of starting a UDF.

This is just my first pass reaction. I need to read up on the libraries you are 
using to do byte code manipulation and how labels work.

> Improve Java-UDF timeout detection
> ----------------------------------
>
>                 Key: CASSANDRA-9954
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-9954
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Robert Stupp
>            Assignee: Robert Stupp
>             Fix For: 3.x
>
>
> CASSANDRA-9402 introduced a sandbox using a thread-pool to enforce security 
> constraints and to detect "amok UDFs" - i.e. UDFs that essentially never 
> return (e.g. {{while (true)}}.
> Currently the safest way to react on such an "amok UDF" is to _fail-fast_ - 
> to stop the C* daemon since stopping a thread (in Java) is just no solution.
> CASSANDRA-9890 introduced further protection by inspecting the byte-code. The 
> same mechanism can also be used to manipulate the Java-UDF byte-code.
> By manipulating the byte-code I mean to add regular "is-amok-UDF" checks in 
> the compiled code.
> EDIT: These "is-amok-UDF" checks would also work for _UNFENCED_ Java-UDFs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-9954) Improve Java-UDF timeout detection

Reply via email to