Hi Team Drill!
While performing performance testing on Drill clusters on AWS EMR, with TPC-H data of scale factor 100, I observed the results for a cluster of 3 nodes are similar to a cluster of 13 nodes. Hence, I am investigating how the query is being carried out and which part of the query handling (e.g. planning, reading the data, executing the query, transferring the record batches) is the dominant time consuming part. Parth kindly suggested I should use the Query Profile from the Web UI and it helped a lot. However, there are some items on the Query Profile page that I did not find documentation to interpret them. I was wondering if you know what the following item are: *I) What are the meaning of operator types: Project, Unordered receiver, Single Sender? I guess Hive sub Scan is the time spent reading data from Hive, is that correct?* *II) What are the units for the Processes columns in the Operator Profiles Overview table? Is it time in a minutes : seconds format?* Also it would be really nice to know: III) What metric does the blue chart on the top of the Overview section present? IV) What is fragment, a minor fragment and major fragment? V) What are the first start and last start and first end and last end? VI) What are the sets over which max, min and average are calculated? VII) Why the Peak memory is so small? 4MB while the machine has 16 GB of Ram The print of the Web UI as well as the json profile are attached. Thanks a lot for your time and help. Thanks, Alex
query profile.json
Description: application/json
