[jira] [Resolved] (IMPALA-6670) Executor-only impalads do not refresh their lib-cache entries

Vuk Ercegovac (JIRA) Fri, 23 Mar 2018 14:48:34 -0700

     [ 
https://issues.apache.org/jira/browse/IMPALA-6670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Vuk Ercegovac resolved IMPALA-6670.
-----------------------------------
       Resolution: Fixed
    Fix Version/s: Impala 2.12.0
                   Impala 3.0

> Executor-only impalads do not refresh their lib-cache entries
> -------------------------------------------------------------
>
>                 Key: IMPALA-6670
>                 URL: https://issues.apache.org/jira/browse/IMPALA-6670
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Backend, Frontend
>    Affects Versions: Impala 2.9.0, Impala 2.10.0, Impala 2.11.0, Impala 3.0, 
> Impala 2.12.0, Impala 2.13.0
>            Reporter: Vuk Ercegovac
>            Assignee: Vuk Ercegovac
>            Priority: Blocker
>             Fix For: Impala 3.0, Impala 2.12.0
>
>
> When impalads are only executors, there is no way for their lib-cache entries 
> to be refreshed. As far as I can tell, the version of the cached file will 
> remain the same until the impalad is restarted (and a query with a udf/uda 
> that references that file is eval'd on that node).
> In contrast, impalads that are both executors and coordinators will receive 
> metadata updates which will result in the cache entry being refreshed. Even 
> in this mode, there is room for inconsistency (e.g., update the jar between 
> coordination and evaluation), but all impalads can be made to converge.
> Basic steps to repro:
>  * Make two jars (I used impala-hive-udfs.jar), one with TestUdf.class and 
> the other with TestUdf.class + ReplaceStringUdf.class
>  * Clear the state
> drop function scratch.identity(boolean);
>  drop function scratch.replace_string(string);
>  * cp part1.jar to tmp.jar
> hadoop fs -cp -f /test-warehouse/scratch.db/part1.jar 
> /test-warehouse/scratch.db/tmp.jar
>  * create identity from tmp.jar
> create function scratch.identity(boolean) returns boolean
>  location '/test-warehouse/scratch.db/tmp.jar'
>  symbol='org.apache.impala.TestUdf';
>  * Run a query on all nodes
> select count( *) from functional.alltypes where scratch.identity(bool_col) = 
> bool_col;
>  * cp part2.jar to tmp.jar
> hadoop fs -cp -f /test-warehouse/scratch.db/part2.jar 
> /test-warehouse/scratch.db/tmp.jar
>  * create replace_string function
> create function scratch.replace_string(string) returns string
>  location '/test-warehouse/scratch.db/tmp.jar'
>  symbol='org.apache.impala.ReplaceStringUdf';
>  * run a query
> select count( *) from functional.alltypes where 
> scratch.replace_string(string_col) = string_col;
> When all impalads are both executors and coordinators, the second query works.
> With:
> ./bin/start-impala-cluster.py --num_coordinators=1
> The second query always results in:
> WARNINGS: ImpalaRuntimeException: Unable to find class.
>  CAUSED BY: ClassNotFoundException: org.apache.impala.ReplaceStringUdf
> (each backend still has the previous version of tmp.jar)
> Currently, executors do not need metadata other than what is supplied by 
> coordinators in the plan. Libs are excluded from this scheme; each impalad 
> tries to maintain consistency with the lib files stored in the FS as of the 
> time of function creation (little more complicated ...). 
> One option here is that plans include lib version information so that 
> impalads can know when a refresh is needed.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Resolved] (IMPALA-6670) Executor-only impalads do not refresh their lib-cache entries

Reply via email to