[
https://issues.apache.org/jira/browse/CASSANDRA-18739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17756796#comment-17756796
]
Andres de la Peña edited comment on CASSANDRA-18739 at 8/21/23 10:26 AM:
-------------------------------------------------------------------------
Regarding the CI results:
* {{testFailingMessage}} in 4.0 is CASSANDRA-18366
* {{test_decommission}} in 4.1 doesn't seem to have an associated ticket, but
it appears on [Butler for
4.0|https://ci-cassandra.apache.org/job/Cassandra-4.0/621/testReport/dtest-novnode.transient_replication_ring_test/TestTransientReplicationRing/test_decommission_2/]
* {{testTimeout}} in 4.1 and 5.0 is CASSANDRA-18641
* {{test_move_backwards_and_cleanup}} in 5.0 is CASSANDRA-18686
* {{ClearSnapshotTest}} in 5.0 looks like an env issue
* {{testLoadingIncompleteSSTable}} in 5.0 is CASSANDRA-18737
* {{test_multi_dc_replace_with_rf1}} in 5.0 doesn't have an associated ticket
nor seems to appear on Butler, can it be an env issue?
The bug seems to be in 3.11, and I guess it will be also in 3.0. Those branches
are still supported until we release 5.0, so I think we should add the fix
there too.
Other than that, the patch looks good to me.
was (Author: adelapena):
Regarding the CI results:
* {{testFailingMessage}} in 4.0 is CASSANDRA-18366
* {{test_decommission}} in 4.1 doesn't seem to have an associated ticket, but
it appears on [Butler for
4.0|https://ci-cassandra.apache.org/job/Cassandra-4.0/621/testReport/dtest-novnode.transient_replication_ring_test/TestTransientReplicationRing/test_decommission_2/]
* {{testTimeout}} in 4.1 and 5.0 is CASSANDRA-18641
* {{test_move_backwards_and_cleanup}} in 5.0 is CASSANDRA-18686
* {{ClearSnapshotTest}} in 5.0 looks like an env issue
* {{testLoadingIncompleteSSTable}} in 5.0 is CASSANDRA-18737
* {{test_multi_dc_replace_with_rf1}} in 5.0 doesn't have an associated ticket
nor seems to appear on Butler, can it be an env issue?
The bug seems to be in 3.11, and I guess it will be also in 3.0. Those branches
are still supported until we release 5.0, so I think we should add the fix
there too.
> UDF functions fail to load on rolling restart
> ---------------------------------------------
>
> Key: CASSANDRA-18739
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18739
> Project: Cassandra
> Issue Type: Bug
> Components: Feature/UDF
> Reporter: Claude Warren
> Assignee: Claude Warren
> Priority: High
> Fix For: 4.0.x, 4.1.x, 5.x
>
> Attachments: udf_error.cql, udf_error_data.cql
>
> Time Spent: 1.5h
> Remaining Estimate: 0h
>
> UDFs fail to reload properly after a rolling restart.
> h3. *Symptom:*
> NPE thrown when used after restart.
> h3. *Steps to recreate:*
> # Create a cluster as per cql file
> # Populate the cluster with data.cql.
> # Execute SELECT city_measurements(city, measurement, 16.5) AS m FROM current
> # expect min and max values for cities.
> # Performing a rolling restart on one server.
> # When the server is back up
> # Execute SELECT city_measurements(city, measurement, 16.5) AS m FROM current
> # expect: error result with NPE message.
> {*}Analysis{*}:
> During system restart the SchemaKeyspace.fetchNonSystemKeyspaces() is called,
> when a keyspace with a UDF is loaded the SchemaKeyspace method
> createUDFFromRow() is called, this in turn calls UDFunction.create() which
> eventually calls back to UDFunction constructor where the
> Schema.instance.getKeyspaceMetadata() is called with the keyspace for the UDF
> name as the argument. However, the keyspace for the UDF name is being
> constructed and is not yet in the instance so the method returns null for the
> KeyspaceMetadata. That null KeyspaceMetadata is then used in the udfContext.
> Later when the UDF method is called, if there is a need to call a method on
> the keyspaceMetadata, such as udfContext.newUDTValue() where the
> implementation uses keyspaceMetadata.types, a null pointer is thrown.
> I have verified this affects version 4.0, 4.1 and trunk. I have not verified
> 3.x but I suspect it is the same there.
> I modified UDFunction constructor to assert that the metadata was not null
> and received the following stack trace
> ERROR [main] 2023-08-09 11:44:46,408 CassandraDaemon.java:911 - Exception
> encountered during startup
> java.lang.AssertionError: No metadata for temperatures.city_measurements_sfunc
> at
> org.apache.cassandra.cql3.functions.UDFunction.<init>(UDFunction.java:240)
> at
> org.apache.cassandra.cql3.functions.JavaBasedUDFunction.<init>(JavaBasedUDFunction.java:195)
> at
> org.apache.cassandra.cql3.functions.UDFunction.create(UDFunction.java:276)
> at
> org.apache.cassandra.schema.SchemaKeyspace.createUDFFromRow(SchemaKeyspace.java:1182)
> at
> org.apache.cassandra.schema.SchemaKeyspace.fetchUDFs(SchemaKeyspace.java:1131)
> at
> org.apache.cassandra.schema.SchemaKeyspace.fetchFunctions(SchemaKeyspace.java:1119)
> at
> org.apache.cassandra.schema.SchemaKeyspace.fetchKeyspace(SchemaKeyspace.java:859)
> at
> org.apache.cassandra.schema.SchemaKeyspace.fetchKeyspacesWithout(SchemaKeyspace.java:848)
> at
> org.apache.cassandra.schema.SchemaKeyspace.fetchNonSystemKeyspaces(SchemaKeyspace.java:836)
> at org.apache.cassandra.schema.Schema.loadFromDisk(Schema.java:132)
> at org.apache.cassandra.schema.Schema.loadFromDisk(Schema.java:121)
> at
> org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:287)
> at
> org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:765)
> at
> org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:889)
>
> {{*Possible solution:*}}
> *Version 4.x*
> Create a KeyspaceMetadata.Builder class that uses accepts the types, tables
> and views but uses a builder for the functions.
> Add a KeyspaceMetadata constructor to accept the KeyspaceMetadata.Builder so
> that the function builder keyspaceMetadata value can be set correctly during
> construction of the KeyspaceMetadata.
> Modify SchemaKeyspace.fetchKeyspace(string) so that it uses the
> KeyspaceMetadata.Builder.
>
> *Version 5.x*
> Similar to 4.x except that the KeyspaceMetadata.Builder will have to have
> builders for Views and Tables because the functions necessary to construct
> those objects will not be available until the KeyspaceMetadata.Builder
> constructs it.
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]