If reloading the data isn’t too much trouble, the first thing I would do is 
recreate the instance with more partitions (e.g. partition per core or 
partition per 2 cores) and check the cores utilization. If this is the same 
dataset as the one in your previous email, you mentioned that it was about 10GB 
per partition, in that case, you might want to allocate at least 40GB for the 
buffer cache and you can reduce storage.memorycomponent.globalbudget to get 
enough memory to execute the job (depending on the number of partitions you 
create). After recreating with higher number of partitions, don’t use “SET 
`compiler.parallelism` "39"”. It will automatically use the number of 
partitions you create.

 

Regarding the metrics time, it includes the results printing time, so if you 
want to see if it has any impact, try adding “limit 1” at the end of your query 
or change it to select count(*) instead of subject_id.

 

Cheers,

Murtadha

 

From: Rana Alotaibi <[email protected]>
Date: Monday, 29 January 2018 at 6:48 AM
To: <[email protected]>
Cc: <[email protected]>, <[email protected]>
Subject: Re: Hyracks Job Requirement Configuration

 

- Do you see all cores being fully utilized during the query execution? 

 I have noticed only 6 cores were utilized
- How much time does the query take right now and how do you measure the query 
execution time? Do you wait for the result to be printed somewhere (e.g. in the 
browser)?

I'm using the HTTP APIs. The response is a JSON object that includes the query 
execution time:

   { "status": "success",
        "metrics": {
                "elapsedTime": "434.627299814s",
                "executionTime": "434.626137977s",
                "resultCount": 4943,
                "resultSize": 132293,- 
                "processedObjects": 46875
        }
}

I run the query 10 times and took the average which is ~6mins.

- You mentioned that you have 4 partitions, how many physical hard drives are 
they mapped to?

 One physical hard drive

- Also, increasing the sort/join memory doesn’t necessarily lead to a better 
performance. Have you tried changing these values to something smaller and 
seeing the effects?

  Yes, I tried the following numbers:

  1) sort-memory: 32MB, join-memory: 64MB

  2) sort-memory: 64MB, join-memory: 128MB

  3) sort-memory: 128MB, join-memory:  265MB

 

The execution time remains on average ~6 - 6.5mins. I didn't see any 
improvement. The configurations that I have now:

- compiler.parallelism :39 //Only 6 were utilized 

- storage.buffercache.size: 20GB

- storage.buffercache.pagesize: 1MB

 

Thanks,

Rana

On Sun, Jan 28, 2018 at 6:41 PM, Murtadha Hubail <[email protected]> wrote:

I have few questions if you don’t mind:

Do you see all cores being fully utilized during the query execution? 

How much time does the query take right now and how do you measure the query 
execution time? Do you wait for the result to be printed somewhere (e.g. in the 
browser)?

You mentioned that you have 4 partitions, how many physical hard drives are 
they mapped to?

Also, increasing the sort/join memory doesn’t necessarily lead to a better 
performance. Have you tried changing these values to something smaller and 
seeing the effects?

 

Cheers,

Murtadha

 

From: Rana Alotaibi <[email protected]>
Date: Monday, 29 January 2018 at 5:21 AM
To: <[email protected]>
Cc: <[email protected]>, <[email protected]>
Subject: Re: Hyracks Job Requirement Configuration

 

Thanks Murtadha! The problem solved. However, increasing the number of cores 
didn't help to improve the performance of that query.

On Sun, Jan 28, 2018 at 5:05 PM, Murtadha Hubail <[email protected]> wrote:

Hi Rana,

The memory used for query processing is automatically calculated as follows:
JVM Max Memory - storage.buffercache.size - storage.memorycomponent.globalbudget

The documentation defaults for these parameters are outdated. The default value 
for storage.buffercache.size is (JVM Max Memory / 4) and it's the same for 
storage.memorycomponent.globalbudget. Since your dataset is already loaded, you 
could reduce the budget of storage.memorycomponent.globalbudget. In addition, 
if I recall correctly, your dataset size is way smaller than what's allocated 
for the buffer cache, so you might want to reduce the buffer cache budget. That 
should give you more than enough memory to execute on 39 cores.

Cheers,
Murtadha


On 01/29/2018, 3:30 AM, "Mike Carey" <[email protected]> wrote:

    + dev


    On 1/28/18 3:37 PM, Rana Alotaibi wrote:
    > Hi all,
    >
    > I would like to make AsterixDB utilizes all available CPU cores (39)
    > that I have for the following query:
    >
    > USE mimiciii;
    > SET `compiler.parallelism` "39";
    > SET `compiler.sortmemory` "128MB";
    > SET `compiler.joinmemory` "265MB";
    > SELECT P.SUBJECT_ID
    > FROM   LABITEMS I, PATIENTS P, P.ADMISSIONS A, A.LABEVENTS E
    > WHERE E.ITEMID/*+bcast*/=I.ITEMID AND
    >              E.FLAG = 'abnormal' AND
    >              I.FLUID='Blood' AND
    >              I.LABEL='Haptoglobin'
    >
    >
    > The total memory size that I have is 125GB(57GB for the AsterixDB
    > buffer cache). By running the above query, I got the following error:
    >
    > "msg": "HYR0009: Job requirement (memory: 10705403904 bytes, CPU
    > cores: 39) exceeds capacity (memory: 3258744832 bytes, CPU cores: 39)"
    >
    > How can I change this capacity default configuration? I'm looking into
    > this page : https://asterixdb.apache.org/docs/0.9.2/ncservice.html .
    > Could you please point me to the appropriate configuration parameter?
    >
    > Thanks
    > -- Rana
    >
    >
    >
    >



 

 

Reply via email to