[
https://issues.apache.org/jira/browse/CASSANDRA-9954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15047080#comment-15047080
]
Ariel Weisberg commented on CASSANDRA-9954:
-------------------------------------------
Thanks Robert. Are we going to do this after CASSANDRA-10395?
I know this isn't part of this issue, but the whitelist and blacklist as
constants seem a little problematic. Just from a deployment and maintenance
perspective allowing people to manipulate them (mechanism not policy) as well
as warning for some things rather then straight up blocking them seems
appropriate. If one thing we want to let people do is leverage existing code
inside UDFs then we don't want to be too inflexible. Definitely not something
to do as part of this, but I am broaching the subject.
Do we allow UDFs in writes? I read the blog post and it seems like you can mark
the UDFs as deterministic/non-deterministic. Part of paving the path for
determinism is disallowing currentTimeMillis() and nanoTime(). If they want
time they should pass them to the UDF as a parameter when the invoke the query.
The same could be said for random number generation. For deterministic UDFs you
might be much more strict or have different warning/error policies for calling
different functions. Doing DNS resolution from a UDF isn't technically wrong if
they have good caching and timeouts in place (or we provide that for them).
For reads do UDFs only run at the coordinator or remotely at replicas before
results are returned? I suppose it doesn't really matter since the pain when
versions or configurations have different whitelist/blacklist settings is the
same.
Checking metrics every 16 times is a little bit too often for most loop
iterations. Maybe make that a property? The check is not cheap and represents
at least a hundred nanoseconds of work possibly more. How often will people
actually have loops to iterate through in UDFs? I imagine if they tear apart a
collection or a JSON doc it will be pretty heavyweight stuff.
[This isn't just verifying anymore, it's
verifyAndInstrument.|https://github.com/apache/cassandra/compare/trunk...snazy:9954-amok-udf-trunk?expand=1#diff-f78d972e8cfe0d78f1c51210dbe681ceR85]
I am not completely familiar with what the compiler does when emitting the
labels for bytecode. Does it have a convention to insert in a bunch of places?
Inserting a check at all the labels seems a bit excessive, but it's just
performance so rather then guess as to how it works let's just measure the
performance in a meaningful way. Do we have a benchmark workload we could run
in cstar that would test UDF performance? Maybe one for a lightweight UDF and
another for the heaviest weight UDF we think we will come across? For the
lightweight UDF we may want to test an expression that invokes several UDFs per
query so that it magnifies the transaction cost of starting a UDF.
This is just my first pass reaction. I need to read up on the libraries you are
using to do byte code manipulation and how labels work.
> Improve Java-UDF timeout detection
> ----------------------------------
>
> Key: CASSANDRA-9954
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9954
> Project: Cassandra
> Issue Type: Improvement
> Reporter: Robert Stupp
> Assignee: Robert Stupp
> Fix For: 3.x
>
>
> CASSANDRA-9402 introduced a sandbox using a thread-pool to enforce security
> constraints and to detect "amok UDFs" - i.e. UDFs that essentially never
> return (e.g. {{while (true)}}.
> Currently the safest way to react on such an "amok UDF" is to _fail-fast_ -
> to stop the C* daemon since stopping a thread (in Java) is just no solution.
> CASSANDRA-9890 introduced further protection by inspecting the byte-code. The
> same mechanism can also be used to manipulate the Java-UDF byte-code.
> By manipulating the byte-code I mean to add regular "is-amok-UDF" checks in
> the compiled code.
> EDIT: These "is-amok-UDF" checks would also work for _UNFENCED_ Java-UDFs.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)