[
https://issues.apache.org/jira/browse/IMPALA-6670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Vuk Ercegovac resolved IMPALA-6670.
-----------------------------------
Resolution: Fixed
Fix Version/s: Impala 2.12.0
Impala 3.0
> Executor-only impalads do not refresh their lib-cache entries
> -------------------------------------------------------------
>
> Key: IMPALA-6670
> URL: https://issues.apache.org/jira/browse/IMPALA-6670
> Project: IMPALA
> Issue Type: Bug
> Components: Backend, Frontend
> Affects Versions: Impala 2.9.0, Impala 2.10.0, Impala 2.11.0, Impala 3.0,
> Impala 2.12.0, Impala 2.13.0
> Reporter: Vuk Ercegovac
> Assignee: Vuk Ercegovac
> Priority: Blocker
> Fix For: Impala 3.0, Impala 2.12.0
>
>
> When impalads are only executors, there is no way for their lib-cache entries
> to be refreshed. As far as I can tell, the version of the cached file will
> remain the same until the impalad is restarted (and a query with a udf/uda
> that references that file is eval'd on that node).
> In contrast, impalads that are both executors and coordinators will receive
> metadata updates which will result in the cache entry being refreshed. Even
> in this mode, there is room for inconsistency (e.g., update the jar between
> coordination and evaluation), but all impalads can be made to converge.
> Basic steps to repro:
> * Make two jars (I used impala-hive-udfs.jar), one with TestUdf.class and
> the other with TestUdf.class + ReplaceStringUdf.class
> * Clear the state
> drop function scratch.identity(boolean);
> drop function scratch.replace_string(string);
> * cp part1.jar to tmp.jar
> hadoop fs -cp -f /test-warehouse/scratch.db/part1.jar
> /test-warehouse/scratch.db/tmp.jar
> * create identity from tmp.jar
> create function scratch.identity(boolean) returns boolean
> location '/test-warehouse/scratch.db/tmp.jar'
> symbol='org.apache.impala.TestUdf';
> * Run a query on all nodes
> select count( *) from functional.alltypes where scratch.identity(bool_col) =
> bool_col;
> * cp part2.jar to tmp.jar
> hadoop fs -cp -f /test-warehouse/scratch.db/part2.jar
> /test-warehouse/scratch.db/tmp.jar
> * create replace_string function
> create function scratch.replace_string(string) returns string
> location '/test-warehouse/scratch.db/tmp.jar'
> symbol='org.apache.impala.ReplaceStringUdf';
> * run a query
> select count( *) from functional.alltypes where
> scratch.replace_string(string_col) = string_col;
> When all impalads are both executors and coordinators, the second query works.
> With:
> ./bin/start-impala-cluster.py --num_coordinators=1
> The second query always results in:
> WARNINGS: ImpalaRuntimeException: Unable to find class.
> CAUSED BY: ClassNotFoundException: org.apache.impala.ReplaceStringUdf
> (each backend still has the previous version of tmp.jar)
> Currently, executors do not need metadata other than what is supplied by
> coordinators in the plan. Libs are excluded from this scheme; each impalad
> tries to maintain consistency with the lib files stored in the FS as of the
> time of function creation (little more complicated ...).
> One option here is that plans include lib version information so that
> impalads can know when a refresh is needed.
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)