The attachment for the json profile made it to the list because it is ASCII, but the screenprint was blocked as a binary file. We can take a look at the profile by loading the json into an instance of Drill, but just a reminder about binary attachments for everyone, please upload to a public host an share a link.
On Tue, Apr 21, 2015 at 2:34 PM, Alexander Zarei <[email protected] > wrote: > Hi Team Drill! > > > > While performing performance testing on Drill clusters on AWS EMR, with > TPC-H data of scale factor 100, I observed the results for a cluster of 3 > nodes are similar to a cluster of 13 nodes. Hence, I am investigating how > the query is being carried out and which part of the query handling (e.g. > planning, reading the data, executing the query, transferring the record > batches) is the dominant time consuming part. > > > > Parth kindly suggested I should use the Query Profile from the Web UI and > it helped a lot. However, there are some items on the Query Profile page > that I did not find documentation to interpret them. I was wondering if you > know what the following item are: > > > > *I) What are the meaning of operator types: Project, > Unordered receiver, Single Sender? I guess Hive sub Scan is the time spent > reading data from Hive, is that correct?* > > *II) What are the units for the Processes columns in the > Operator Profiles Overview table? Is it time in a minutes : seconds format?* > > > > Also it would be really nice to know: > > III) What metric does the blue chart on the top of the > Overview section present? > > IV) What is fragment, a minor fragment and major fragment? > > V) What are the first start and last start and first end > and last end? > > VI) What are the sets over which max, min and average are > calculated? > > VII) Why the Peak memory is so small? 4MB while the machine > has 16 GB of Ram > > > > The print of the Web UI as well as the json profile are attached. > > Thanks a lot for your time and help. > > > > Thanks, > > Alex >
