[ 
https://issues.apache.org/jira/browse/IMPALA-12269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Becker resolved IMPALA-12269.
------------------------------------
    Resolution: Fixed

> Codegen cache false negative because of function names hash
> -----------------------------------------------------------
>
>                 Key: IMPALA-12269
>                 URL: https://issues.apache.org/jira/browse/IMPALA-12269
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Backend
>            Reporter: Daniel Becker
>            Assignee: Daniel Becker
>            Priority: Major
>
> Codegen cache entries (execution engines) are stored by keys derived from the 
> unoptimised llvm modules, the key is either the whole module (normal mode) or 
> its hash (optimal mode). Because hash collisions are possible (in optimal 
> mode), as an extra precaution we also compare the hashes of the function 
> names in the current and the cached module. However, when assembling the 
> function name list we do not filter out duplicate function names, which may 
> result in cases where the unoptimised llvm modules are identical but the 
> function name hashes do not match.
> *Example:*
> First query:
> {code:java}
> select int_col, tinyint_col
> from alltypessmall
> order by int_col desc
> limit 20;{code}
> Second query:
> {code:java}
> select tinyint_col
> from alltypessmall
> order by int_col desc
> limit 20;{code}
> In the first query, there are two {{SlotRef}} objects referencing 
> {{tinyint_col}} which want to codegen a {{GetSlotRef()}} function. The second 
> invokation of {{SlotRef::GetCodegendComputeFnImpl()}} checks the already 
> codegen'd functions, finds the function from its first invokation and returns 
> that (see 
> [https://github.com/apache/impala/blob/929b91ac644561ee68da7923cf5272eb300d79de/be/src/exprs/slot-ref.cc#L213]).
>  The two {{SlotRef}} objects will use the same llvm::Function and there will 
> be only one copy of it in the module, but both will call 
> {{LlvmCodeGen::AddFunctionToJit()}} with this function in order for their 
> respective function pointers to be set after compilation.
> {{LlvmCodeGen::GetAllFunctionNames()}} will return the names of all functions 
> with which {{LlvmCodeGen::AddFunctionToJit()}} has been called, including 
> duplicates.
> The second query generates the same unoptimised module as the first query 
> (for the corresponding fragment), but does not have a duplicated 
> {{GetSlotRef()}} function in its function name list, so the cached module is 
> rejected.
> Note that this also results in the cached module being evicted because the 
> new module will have the same key as the cached one (the modules are 
> identical).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to