[
https://issues.apache.org/jira/browse/IMPALA-12269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Daniel Becker resolved IMPALA-12269.
------------------------------------
Resolution: Fixed
> Codegen cache false negative because of function names hash
> -----------------------------------------------------------
>
> Key: IMPALA-12269
> URL: https://issues.apache.org/jira/browse/IMPALA-12269
> Project: IMPALA
> Issue Type: Bug
> Components: Backend
> Reporter: Daniel Becker
> Assignee: Daniel Becker
> Priority: Major
>
> Codegen cache entries (execution engines) are stored by keys derived from the
> unoptimised llvm modules, the key is either the whole module (normal mode) or
> its hash (optimal mode). Because hash collisions are possible (in optimal
> mode), as an extra precaution we also compare the hashes of the function
> names in the current and the cached module. However, when assembling the
> function name list we do not filter out duplicate function names, which may
> result in cases where the unoptimised llvm modules are identical but the
> function name hashes do not match.
> *Example:*
> First query:
> {code:java}
> select int_col, tinyint_col
> from alltypessmall
> order by int_col desc
> limit 20;{code}
> Second query:
> {code:java}
> select tinyint_col
> from alltypessmall
> order by int_col desc
> limit 20;{code}
> In the first query, there are two {{SlotRef}} objects referencing
> {{tinyint_col}} which want to codegen a {{GetSlotRef()}} function. The second
> invokation of {{SlotRef::GetCodegendComputeFnImpl()}} checks the already
> codegen'd functions, finds the function from its first invokation and returns
> that (see
> [https://github.com/apache/impala/blob/929b91ac644561ee68da7923cf5272eb300d79de/be/src/exprs/slot-ref.cc#L213]).
> The two {{SlotRef}} objects will use the same llvm::Function and there will
> be only one copy of it in the module, but both will call
> {{LlvmCodeGen::AddFunctionToJit()}} with this function in order for their
> respective function pointers to be set after compilation.
> {{LlvmCodeGen::GetAllFunctionNames()}} will return the names of all functions
> with which {{LlvmCodeGen::AddFunctionToJit()}} has been called, including
> duplicates.
> The second query generates the same unoptimised module as the first query
> (for the corresponding fragment), but does not have a duplicated
> {{GetSlotRef()}} function in its function name list, so the cached module is
> rejected.
> Note that this also results in the cached module being evicted because the
> new module will have the same key as the cached one (the modules are
> identical).
--
This message was sent by Atlassian Jira
(v8.20.10#820010)