[
https://issues.apache.org/jira/browse/SPARK-27278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16809554#comment-16809554
]
Huon Wilson commented on SPARK-27278:
-------------------------------------
That does seem to be what added this optimisation, but it missed a case due to
interactions with constant folding (which is this issue). If the code required
to fill that hole has bad problems with code size (i.e. enough to revert it),
then it seems that's saying something about the entire {{map}}->{{CASE WHEN}}
optimisation, including the old form.
It seems to me that the whole optimisation needs to be considered here, not
just this bug fix, because I'm not sure I see a meaningful difference between
{{map1}} and {{map2}} in the following code. (That is, why should {{map1}} be
optimised but not {{map2}}?)
{code:none}
val map1 = map(lit(1), lit(1), lit(2), 'id)
val map2 = map(lit(1), lit(1), lit(2), lit(2))
{code}
Obviously this is entirely up to you, but the difference between these two very
similar pieces of code is a bit disconcerting.
> Optimize GetMapValue when the map is a foldable and the key is not
> ------------------------------------------------------------------
>
> Key: SPARK-27278
> URL: https://issues.apache.org/jira/browse/SPARK-27278
> Project: Spark
> Issue Type: Improvement
> Components: SQL
> Affects Versions: 2.4.0
> Environment: Spark 2.4.0
> Reporter: Huon Wilson
> Priority: Minor
>
> With a map that isn't constant-foldable, spark will optimise an access to a
> series of {{CASE WHEN ... THEN ... WHEN ... THEN ... END}}, for instance
> {code:none}
> scala> spark.range(1000).select(map(lit(1), lit(1), lit(2), 'id)('id) as
> "x").explain
> == Physical Plan ==
> *(1) Project [CASE WHEN (cast(id#180L as int) = 1) THEN 1 WHEN (cast(id#180L
> as int) = 2) THEN id#180L END AS x#182L]
> +- *(1) Range (0, 1000, step=1, splits=12)
> {code}
> This results in an efficient series of ifs and elses, in the code generation:
> {code:java}
> /* 037 */ boolean project_isNull_3 = false;
> /* 038 */ int project_value_3 = -1;
> /* 039 */ if (!false) {
> /* 040 */ project_value_3 = (int) project_expr_0_0;
> /* 041 */ }
> /* 042 */
> /* 043 */ boolean project_value_2 = false;
> /* 044 */ project_value_2 = project_value_3 == 1;
> /* 045 */ if (!false && project_value_2) {
> /* 046 */ project_caseWhenResultState_0 = (byte)(false ? 1 : 0);
> /* 047 */ project_project_value_1_0 = 1L;
> /* 048 */ continue;
> /* 049 */ }
> /* 050 */
> /* 051 */ boolean project_isNull_8 = false;
> /* 052 */ int project_value_8 = -1;
> /* 053 */ if (!false) {
> /* 054 */ project_value_8 = (int) project_expr_0_0;
> /* 055 */ }
> /* 056 */
> /* 057 */ boolean project_value_7 = false;
> /* 058 */ project_value_7 = project_value_8 == 2;
> /* 059 */ if (!false && project_value_7) {
> /* 060 */ project_caseWhenResultState_0 = (byte)(false ? 1 : 0);
> /* 061 */ project_project_value_1_0 = project_expr_0_0;
> /* 062 */ continue;
> /* 063 */ }
> {code}
> If the map can be constant folded, the constant folding happens first, and
> the {{SimplifyExtractValueOps}} optimisation doesn't trigger, resulting doing
> a map traversal and more dynamic checks:
> {code:none}
> scala> spark.range(1000).select(map(lit(1), lit(1), lit(2), lit(2))('id) as
> "x").explain
> == Physical Plan ==
> *(1) Project [keys: [1,2], values: [1,2][cast(id#195L as int)] AS x#197]
> +- *(1) Range (0, 1000, step=1, splits=12)
> {code}
> The {{keys: ..., values: ...}} is from the {{ArrayBasedMapData}} type, which
> is what is stored in the {{Literal}} form of the {{map(...)}} expression in
> that select. The code generated is less efficient, since it has to do a
> manual dynamic traversal of the map's array of keys, with type casts etc.:
> {code:java}
> /* 099 */ int project_index_0 = 0;
> /* 100 */ boolean project_found_0 = false;
> /* 101 */ while (project_index_0 < project_length_0 &&
> !project_found_0) {
> /* 102 */ final int project_key_0 =
> project_keys_0.getInt(project_index_0);
> /* 103 */ if (project_key_0 == project_value_2) {
> /* 104 */ project_found_0 = true;
> /* 105 */ } else {
> /* 106 */ project_index_0++;
> /* 107 */ }
> /* 108 */ }
> /* 109 */
> /* 110 */ if (!project_found_0) {
> /* 111 */ project_isNull_0 = true;
> /* 112 */ } else {
> /* 113 */ project_value_0 =
> project_values_0.getInt(project_index_0);
> /* 114 */ }
> {code}
> It looks like the problem is in {{SimplifyExtractValueOps}}, which doesn't
> handle {{GetMapValue(Literal(...), key)}}, only the {{CreateMap}} form:
> {code:scala}
> case GetMapValue(CreateMap(elems), key) => CaseKeyWhen(key, elems)
> {code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]