When a query involves ESP (Executor Server Process), the following is done.

1. Master process creates ESP processes in different nodes via the Trafodion 
clustering infrastructure. This process creation is no-waited and hence all the 
ESP processes should be started almost simultaneously.
2. Master process then sends the fragment of the query that needs to be 
executed to every ESP. When this completes, the query starts executing.
 3. As part of query execution,  ESP would establish a connection to HBase for 
the first time.  The HBase connection will be reused for subsequent query 
execution. Currently, this connection time is not tracked separately. I think 
it might be good to track this connection time separately.  

You can try the following:

Prepare s1 a query with table 1 involving ESPs
Execute the query 
Re-execute the query

Prepare s1 a query with table 2 involving ESPs
Execute the query
Re-execute the query

Use the same statement name s1. This would cause the ESPs to be re-used for the 
2nd query.

Selva

-----Original Message-----
From: Liu, Yao-Hua (Joshua) [mailto:[email protected]] 
Sent: Monday, January 1, 2018 7:07 PM
To: [email protected]; [email protected]
Subject: 答复: how would esp do when it was launched?

By the way, we can get rid of the cache since if in step3 we choose another 
table to select, then the elapsed time would also be super-fast.


-----邮件原件-----
发件人: Liu, Yao-Hua (Joshua) [mailto:[email protected]] 
发送时间: 2018年1月2日 11:02
收件人: [email protected]; [email protected]
主题: 答复: how would esp do when it was launched?

Hi Dave,

        Thanks for your suggestion!
        Actually the table is trafodion table which is only named with _HIVE. 
For your 3 steps
        1. prepare
          It would take 2 seconds
        2. execute first time
          It would take 78 seconds. Here to start all the ESPs would take less 
than 1 second
        3. execute the second time
          It would take 3 seconds.
        
        So I am wondering what does ESP do during it was lauched?

Thanks
Joshua

-----邮件原件-----
发件人: Dave Birdsall [mailto:[email protected]] 
发送时间: 2017年12月31日 8:42
收件人: [email protected]; [email protected]
主题: RE: how would esp do when it was launched?

Hi,

How big is the table? How many esps are we creating? Perhaps we are creating 
the esps serially; maybe that is what is taking the time.

Another factor to look at is compile time. You can separate that out as follows:

Step 1: using trafci, do a "prepare" of your query. See how long that takes.
Step 2: then execute the query. How long does that take?
Step 3: re-execute the query. How long does that take?

I expect part of that 80 seconds will be consumed in step 1 as compile time. 
Would be interesting to know if compile time is, say, 2 seconds or 78 seconds. 
If the latter, perhaps the issue is how we read statistics for a Hive table 
with many partitions.

Dave

-----Original Message-----
From: Liu, Yao-Hua (Joshua) [mailto:[email protected]] 
Sent: Friday, December 22, 2017 1:01 AM
To: [email protected]
Subject: how would esp do when it was launched?

Hi all,

       Suresh and I found some interesting thing when run some queries.

       Step 1:
       Use trafci, run query: select count(*) from CELL_INDICATOR_HIVE where 
starttime=20170801000000000;  // CELL_INDICATOR_HIVE has 100 billion rows and 
each starttime would have 4346483 rows. Starttime is the first column in store 
by keys
       This would take about 1 minute and 20 seconds to finish.
       Step2
       Run above sql again, then it would take 3 seconds to finish.
       Here 80s vs 3s, we may guess it's due to esp start time or cache. But we 
checked,

1.     to start all the esps would take less than 1 seconds.

2.     If due to cache, we can run another table for a test:
       Step3
       Run another query: select count(*) from SERVERIP_INDICATOR_BAK where 
starttime=20170801000000000;  // SERVERIP_INDICATOR_BAK has 64 billion rows and 
each starttime would have 2.8 million rows. Starttime is also the first column 
in store by keys. Then it would take 2 seconds to finish.

       By the way, if we start another trafci(not the same mxosrvr from above) 
and run above select count(*) from SERVERIP_INDICATOR_BAK where 
starttime=20170801000000000, it would also take 1 minute or more.

       So we are wondering what does esp do when it was started? Why the first 
time the esp to scan one table would take so much time but the second time to 
scan another table could be much faster?

Thanks
Joshua

Reply via email to