Re: Hyracks Job Requirement Configuration

Rana Alotaibi Mon, 29 Jan 2018 00:29:11 -0800

Thanks Murtadha for your informative email. I have now 15 partitions (~15
cores were utilized as well), and it helps to reduce the execution time.
The query execution time now is ~3.2 mins :).


--Rana



On Sun, Jan 28, 2018 at 8:29 PM, Murtadha Hubail <[email protected]>
wrote:

> If reloading the data isn’t too much trouble, the first thing I would do
> is recreate the instance with more partitions (e.g. partition per core or
> partition per 2 cores) and check the cores utilization. If this is the same
> dataset as the one in your previous email, you mentioned that it was about
> 10GB per partition, in that case, you might want to allocate at least 40GB
> for the buffer cache and you can reduce storage.memorycomponent.globalbudget
> to get enough memory to execute the job (depending on the number of
> partitions you create). After recreating with higher number of partitions,
> don’t use “SET `compiler.parallelism` "39"”. It will automatically use the
> number of partitions you create.
>
>
>
> Regarding the metrics time, it includes the results printing time, so if
> you want to see if it has any impact, try adding “limit 1” at the end of
> your query or change it to select count(*) instead of subject_id.
>
>
>
> Cheers,
>
> Murtadha
>
>
>
> *From: *Rana Alotaibi <[email protected]>
> *Date: *Monday, 29 January 2018 at 6:48 AM
>
> *To: *<[email protected]>
> *Cc: *<[email protected]>, <[email protected]>
> *Subject: *Re: Hyracks Job Requirement Configuration
>
>
>
> *- Do you see all cores being fully utilized during the query execution? *
>
>  I have noticed only 6 cores were utilized
> *- How much time does the query take right now and how do you measure the
> query execution time? Do you wait for the result to be printed somewhere
> (e.g. in the browser)?*
>
> I'm using the HTTP APIs. The response is a JSON object that includes the
> query execution time:
>
>    { "status": "success",
>         "metrics": {
>
> * "elapsedTime": "434.627299814s",                "executionTime":
> "434.626137977s",*
>                 "resultCount": 4943,
>                 "resultSize": 132293,-
>                 "processedObjects": 46875
>         }
> }
>
> I run the query 10 times and took the average which is ~6mins.
>
> *- You mentioned that you have 4 partitions, how many physical hard drives
> are they mapped to?*
>
>  One physical hard drive
>
> *- Also, increasing the sort/join memory doesn’t necessarily lead to a
> better performance. Have you tried changing these values to something
> smaller and seeing the effects?*
>
>   Yes, I tried the following numbers:
>
>   1) sort-memory: 32MB, join-memory: 64MB
>
>   2) sort-memory: 64MB, join-memory: 128MB
>
>   3) sort-memory: 128MB, join-memory:  265MB
>
>
>
> The execution time remains on average ~6 - 6.5mins. I didn't see any
> improvement. The configurations that I have now:
>
> - compiler.parallelism :39 //Only 6 were utilized
>
> - storage.buffercache.size: 20GB
>
> - storage.buffercache.pagesize: 1MB
>
>
>
> Thanks,
>
> Rana
>
> On Sun, Jan 28, 2018 at 6:41 PM, Murtadha Hubail <[email protected]>
> wrote:
>
> I have few questions if you don’t mind:
>
> Do you see all cores being fully utilized during the query execution?
>
> How much time does the query take right now and how do you measure the
> query execution time? Do you wait for the result to be printed somewhere
> (e.g. in the browser)?
>
> You mentioned that you have 4 partitions, how many physical hard drives
> are they mapped to?
>
> Also, increasing the sort/join memory doesn’t necessarily lead to a better
> performance. Have you tried changing these values to something smaller and
> seeing the effects?
>
>
>
> Cheers,
>
> Murtadha
>
>
>
> *From: *Rana Alotaibi <[email protected]>
> *Date: *Monday, 29 January 2018 at 5:21 AM
> *To: *<[email protected]>
> *Cc: *<[email protected]>, <[email protected]>
> *Subject: *Re: Hyracks Job Requirement Configuration
>
>
>
> Thanks Murtadha! The problem solved. However, increasing the number of
> cores didn't help to improve the performance of that query.
>
> On Sun, Jan 28, 2018 at 5:05 PM, Murtadha Hubail <[email protected]>
> wrote:
>
> Hi Rana,
>
> The memory used for query processing is automatically calculated as
> follows:
> JVM Max Memory - storage.buffercache.size - storage.memorycomponent.
> globalbudget
>
> The documentation defaults for these parameters are outdated. The default
> value for storage.buffercache.size is (JVM Max Memory / 4) and it's the
> same for storage.memorycomponent.globalbudget. Since your dataset is
> already loaded, you could reduce the budget of 
> storage.memorycomponent.globalbudget.
> In addition, if I recall correctly, your dataset size is way smaller than
> what's allocated for the buffer cache, so you might want to reduce the
> buffer cache budget. That should give you more than enough memory to
> execute on 39 cores.
>
> Cheers,
> Murtadha
>
>
> On 01/29/2018, 3:30 AM, "Mike Carey" <[email protected]> wrote:
>
>     + dev
>
>
>     On 1/28/18 3:37 PM, Rana Alotaibi wrote:
>     > Hi all,
>     >
>     > I would like to make AsterixDB utilizes all available CPU cores (39)
>     > that I have for the following query:
>     >
>     > USE mimiciii;
>     > SET `compiler.parallelism` "39";
>     > SET `compiler.sortmemory` "128MB";
>     > SET `compiler.joinmemory` "265MB";
>     > SELECT P.SUBJECT_ID
>     > FROM   LABITEMS I, PATIENTS P, P.ADMISSIONS A, A.LABEVENTS E
>     > WHERE E.ITEMID/*+bcast*/=I.ITEMID AND
>     >              E.FLAG = 'abnormal' AND
>     >              I.FLUID='Blood' AND
>     >              I.LABEL='Haptoglobin'
>     >
>     >
>     > The total memory size that I have is 125GB(57GB for the AsterixDB
>     > buffer cache). By running the above query, I got the following error:
>     >
>     > "msg": "HYR0009: Job requirement (memory: 10705403904 bytes, CPU
>     > cores: 39) exceeds capacity (memory: 3258744832 <(325)%20874-4832>
> bytes, CPU cores: 39)"
>     >
>     > How can I change this capacity default configuration? I'm looking
> into
>     > this page : https://asterixdb.apache.org/docs/0.9.2/ncservice.html .
>     > Could you please point me to the appropriate configuration parameter?
>     >
>     > Thanks
>     > -- Rana
>     >
>     >
>     >
>     >
>
>
>
>
>
>

Re: Hyracks Job Requirement Configuration

Reply via email to