Hey Sungwoo!

On 11/13/20 6:47 PM, Sungwoo Park wrote:
I have run another fresh TPC-DS test using the latest commit. Here is the 
summary:
Thank you very much!

> 1. With hive.optimize.shared.work.dppunion=true, query 2 and 59 fail. Please 
see the attachment for stack traces.

Even thru the exception seem to be a reoccurance of the previous issue - 
existing checks + HIVE-24360 should have restricted all incorrect cases.
I built in some debug stuff while I made these patches - and it would help a lot to get a peek into those; but they need to be enabled by hand/etc...while I polish that a bit more - could you please share an EXPLAIN FORMATTED about one of the queries failing because of that patch?


> 2. Query 14 fails in both cases, and it seems like another bug. Note that 
when hive.cbo.enable is set to true when running query 14.

I think you will find some cbo exception in the hive logs - explaining why it 
resorts to the non-cbo path.


> 3. For some queries, the number of rows is different between the two experiments. In most cases, it seems to be rounding errors, but the difference is rather large for some queries (e.g., query 29 and 58). Please see the attachment for the result.

that's very odd - I've recently fixed a bug in swo which may have caused issues like this(HIVE-24365); I would recommend to compare the result with the whole thing off (hive.optimize.shared.work=false).
If you could isolate and reproduce this in a qtest I could also dig into it.


cheers,
Zoltan


Commits used:

1) Hive, master, e9f72e654750de208227d46a22e983413b080c6c (HIVE-24366, Thu Nov 
12)
2) Tez, 0.10.0, 22fec6c0ecc7ebe6f6f28800935cc6f69794dad5 (CHANGES.txt updated 
with TEZ-4238, Thu Oct 8)

Scenario:

1) create a database consisting of external tables from a 100GB TPC-DS text 
dataset
2) create a database consisting of ORC tables
3) compute column statistics, set tez.runtime.compress=false
4) run TPC-DS queries and check the results

Configuration:

1) set hive.execution.engine=tez, hive.execution.mode=container
2) set hive.cbo.enable=true

Experiment #1: hive.optimize.shared.work.dppunion=true

Query 2 fails:

java.lang.IllegalArgumentException: Edge [Reducer 9 : org.apache.hadoop.hive.ql.exec.tez.ReduceTezProcessor] -> [Map 6 : org.apache.hadoop.hive.ql.exec.tez.MapTezProcessor] ({ BROADCAST : org.apache.tez.runtime.library.input.UnorderedKVInput >> PERSISTED >> org.apache.tez.runtime.library.output.UnorderedKVOutput >> NullEdgeManager }) already defined!

Query 14 fails:

org.apache.hadoop.hive.ql.parse.SemanticException: EXCEPT and INTERSECT operations are only supported with Cost Based Optimizations enabled. Please set 'hive.cbo.enable' to true!

Query 59 fails:

java.lang.IllegalArgumentException: Edge [Reducer 6 : org.apache.hadoop.hive.ql.exec.tez.ReduceTezProcessor] -> [Map 4 : org.apache.hadoop.hive.ql.exec.tez.MapTezProcessor] ({ BROADCAST : org.apache.tez.runtime.library.input.UnorderedKVInput >> PERSISTED >> org.apache.tez.runtime.library.output.UnorderedKVOutput >> NullEdgeManager }) already defined!

Experiment #2: hive.optimize.shared.work.dppunion=false

Query 14 fails:

org.apache.hive.service.cli.HiveSQLException: Error while compiling statement: FAILED: SemanticException EXCEPT and INTERSECT operations are only supported with Cost Based Optimizations enabled. Please set 'hive.cbo.enable' to true!

Summary:

1. With hive.optimize.shared.work.dppunion=true, query 2 and 59 fail. Please 
see the attachment for stack traces.

2. Query 14 fails in both cases, and it seems like another bug. Note that when 
hive.cbo.enable is set to true when running query 14.

3. For some queries, the number of rows is different between the two experiments. In most cases, it seems to be rounding errors, but the difference is rather large for some queries (e.g., query 29 and 58). Please see the attachment for the result.

I could open a new Jira for this issue, or create a sub-task of HIVE-24384. Or 
perhaps HIVE-24384 is already enough. So please let me know which would be good 
for you.

(I have automated the entire experiment, so if you would like to see the result 
of testing a new commit, I would be happy to rerun the experiment and get back 
to you.)

Cheers,

--- Sungwoo

On Thu, Nov 12, 2020 at 10:49 PM Zoltan Haindrich <k...@rxd.hu 
<mailto:k...@rxd.hu>> wrote:

    Hey Sungwoo!

    On 11/12/20 10:23 AM, Sungwoo Park wrote:
     > Hi Zoltan,
     >
     > I used the same hive-site.xml for the previous test (which was okay) and
     > the new test (which failed), so my guess is that it is perhaps due to a
     > commit since the previous test. Let me try later to identify the commit
     > that fails query 14, with the hope that identifying such a commit might 
be
     > useful in debugging.

    That would definetly help - if you could share the 2 commit hashes; it 
might be possible that we could guess it from the commit message or something.


     > Another question: is HIVE-24360 part of a solution to the problem of
     > hive.optimize.shared.work <http://hive.optimize.shared.work>.dppunion?
     > I have tried the latest commit (which includes HIVE-24360) using the 
TPC-DS
     > benchmark, and it seems like the problem still exists.

    Yes, HIVE-24360 should have fixed that - do you still see an exception 
coming from tez-api reporting edge errors?
    I will also pick these changes for a smaller benchmark run soon...but I'm 
not running any right now. Could also note for which query you've seen the 
exception - so that I
    could also check it.
    Could you please open a jira about this - and add the actual exception 
trace/etc if available?

    cheers,
    Zoltan

     >
     > Cheers,
     >
     > --- Sungwoo
     >
     > On Mon, Nov 9, 2020 at 6:18 PM Zoltan Haindrich <k...@rxd.hu 
<mailto:k...@rxd.hu>> wrote:
     >
     >> Hey Sungwoo!
     >>
     >> Regarding Q14 / "java.lang.RuntimeException: equivalence mapping 
violation"
     >>
     >>   From the stack trace you shared it seems like the mapper have already
     >> seen both the filter and the ast node earlier - and they are in separate
     >> mapping groups. (Which is
     >> unfortunate) I think it won't be simple to track that down - it will
     >> definetly need some debugging.
     >> The best would be to have a repro query for it...
     >>
     >> note: we already run q14 in TestTezPerf*Driver - could it might be
     >> possible that we've disabled some features in the hive-site.xml for 
these
     >> tests; and that's why we
     >> haven't seen it before?
     >>
     >> cheers,
     >> Zoltan
     >>
     >>
     >

Reply via email to