Tim Armstrong created IMPALA-8486:
-------------------------------------
Summary: test_udf_update_via_drop and test_udf_update_via_create
fail on local catalog
Key: IMPALA-8486
URL: https://issues.apache.org/jira/browse/IMPALA-8486
Project: IMPALA
Issue Type: Improvement
Components: Catalog
Affects Versions: Impala 3.3.0
Reporter: Tim Armstrong
Assignee: Todd Lipcon
{noformat}
TestUdfTargeted.test_udf_update_via_drop[protocol: beeswax | exec_option:
{'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 0,
'disable_codegen': False, 'abort_on_error': 1,
'exec_single_node_rows_threshold': 0} | table_format: text/none]
tests/query_test/test_udfs.py:541: in test_udf_update_via_drop
self._run_query_all_impalads(exec_options, query_stmt, ["New UDF"])
tests/query_test/test_udfs.py:52: in _run_query_all_impalads
assert result.data == expected
E assert ['Old UDF'] == ['New UDF']
E At index 0 diff: 'Old UDF' != 'New UDF'
E Full diff:
E - ['Old UDF']
E + ['New UDF']
----------------------------
{noformat}
The tests are checking that the local UDF caches on each impalad get
invalidated by a drop/create of a function referencing the HDFS file containing
the UDF. The test fails because the local catalog, unlike the regular catalog,
doesn't invalidate LibCache entries upon receiving a catalog update.
I looked at this for long enough to realise that the invalidation mechanism is
fundamentally broken - it doesn't work with dedicated executors. It also
creates a race between the statestore updates and queries referencing the UDFs
- if the queries win the race, then they can incorrectly use the old version
that should have been invalidated.
I think this is a potentially problematic issue because old JAR/SO versions
could persist in the cache indefinitely if old versions are overwritten in
place.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]