[
https://issues.apache.org/jira/browse/HIVE-15489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15861872#comment-15861872
]
Chao Sun commented on HIVE-15489:
---------------------------------
Thanks for reviewing the patch, [~xuefuz]!
bq. 1. The new configuration might have a better name.
"hive.spark.use.ts.stats" seems a little too general. Please consider a more
specific name, something like "hive_on_spark.use.file.size.for.mapjoin". Very
minor though.
Sure. I can change that. HIVE-15796 is also going to add a new config
{{hive.spark.use.op.stats}}. Do you think we should combine these two? since
they are similar.
bq. 2. For new property, we probably want to default it to the old behavior
when checking in. Maybe we can have some test cases run with this new
configuration on.
Yes. I plan to set the default to false. Setting it to true is just for testing.
bq. 3. If join op isn't coming directly from table scan, I saw we are still
using operator stats to decide mapjoin. This can still cause the issue of
inaccurate estimation, right? Should we just don't convert it to map join in
such a case?
I've thought about this. The downside is many good cases will be turned to
reduce join as well. But I think this config is mainly for stability, so it
should be fine, as long as we document this well. Will add to next patch.
bq. 4. There seems to be some test failures in the above run. Are they related?
I ran these tests locally and didn't see any issue.
> Alternatively use table scan stats for HoS
> ------------------------------------------
>
> Key: HIVE-15489
> URL: https://issues.apache.org/jira/browse/HIVE-15489
> Project: Hive
> Issue Type: Improvement
> Components: Spark, Statistics
> Affects Versions: 2.2.0
> Reporter: Chao Sun
> Assignee: Chao Sun
> Attachments: HIVE-15489.1.patch, HIVE-15489.2.patch,
> HIVE-15489.3.patch, HIVE-15489.4.patch, HIVE-15489.wip.patch
>
>
> For MapJoin in HoS, we should provide an option to only use stats in the TS
> rather than the populated stats in each of the join branch. This could be
> pretty conservative but more reliable.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)