Thanks Murtadha for your informative email. I have now 15 partitions (~15 cores were utilized as well), and it helps to reduce the execution time. The query execution time now is ~3.2 mins :).
--Rana On Sun, Jan 28, 2018 at 8:29 PM, Murtadha Hubail <[email protected]> wrote: > If reloading the data isn’t too much trouble, the first thing I would do > is recreate the instance with more partitions (e.g. partition per core or > partition per 2 cores) and check the cores utilization. If this is the same > dataset as the one in your previous email, you mentioned that it was about > 10GB per partition, in that case, you might want to allocate at least 40GB > for the buffer cache and you can reduce storage.memorycomponent.globalbudget > to get enough memory to execute the job (depending on the number of > partitions you create). After recreating with higher number of partitions, > don’t use “SET `compiler.parallelism` "39"”. It will automatically use the > number of partitions you create. > > > > Regarding the metrics time, it includes the results printing time, so if > you want to see if it has any impact, try adding “limit 1” at the end of > your query or change it to select count(*) instead of subject_id. > > > > Cheers, > > Murtadha > > > > *From: *Rana Alotaibi <[email protected]> > *Date: *Monday, 29 January 2018 at 6:48 AM > > *To: *<[email protected]> > *Cc: *<[email protected]>, <[email protected]> > *Subject: *Re: Hyracks Job Requirement Configuration > > > > *- Do you see all cores being fully utilized during the query execution? * > > I have noticed only 6 cores were utilized > *- How much time does the query take right now and how do you measure the > query execution time? Do you wait for the result to be printed somewhere > (e.g. in the browser)?* > > I'm using the HTTP APIs. The response is a JSON object that includes the > query execution time: > > { "status": "success", > "metrics": { > > * "elapsedTime": "434.627299814s", "executionTime": > "434.626137977s",* > "resultCount": 4943, > "resultSize": 132293,- > "processedObjects": 46875 > } > } > > I run the query 10 times and took the average which is ~6mins. > > *- You mentioned that you have 4 partitions, how many physical hard drives > are they mapped to?* > > One physical hard drive > > *- Also, increasing the sort/join memory doesn’t necessarily lead to a > better performance. Have you tried changing these values to something > smaller and seeing the effects?* > > Yes, I tried the following numbers: > > 1) sort-memory: 32MB, join-memory: 64MB > > 2) sort-memory: 64MB, join-memory: 128MB > > 3) sort-memory: 128MB, join-memory: 265MB > > > > The execution time remains on average ~6 - 6.5mins. I didn't see any > improvement. The configurations that I have now: > > - compiler.parallelism :39 //Only 6 were utilized > > - storage.buffercache.size: 20GB > > - storage.buffercache.pagesize: 1MB > > > > Thanks, > > Rana > > On Sun, Jan 28, 2018 at 6:41 PM, Murtadha Hubail <[email protected]> > wrote: > > I have few questions if you don’t mind: > > Do you see all cores being fully utilized during the query execution? > > How much time does the query take right now and how do you measure the > query execution time? Do you wait for the result to be printed somewhere > (e.g. in the browser)? > > You mentioned that you have 4 partitions, how many physical hard drives > are they mapped to? > > Also, increasing the sort/join memory doesn’t necessarily lead to a better > performance. Have you tried changing these values to something smaller and > seeing the effects? > > > > Cheers, > > Murtadha > > > > *From: *Rana Alotaibi <[email protected]> > *Date: *Monday, 29 January 2018 at 5:21 AM > *To: *<[email protected]> > *Cc: *<[email protected]>, <[email protected]> > *Subject: *Re: Hyracks Job Requirement Configuration > > > > Thanks Murtadha! The problem solved. However, increasing the number of > cores didn't help to improve the performance of that query. > > On Sun, Jan 28, 2018 at 5:05 PM, Murtadha Hubail <[email protected]> > wrote: > > Hi Rana, > > The memory used for query processing is automatically calculated as > follows: > JVM Max Memory - storage.buffercache.size - storage.memorycomponent. > globalbudget > > The documentation defaults for these parameters are outdated. The default > value for storage.buffercache.size is (JVM Max Memory / 4) and it's the > same for storage.memorycomponent.globalbudget. Since your dataset is > already loaded, you could reduce the budget of > storage.memorycomponent.globalbudget. > In addition, if I recall correctly, your dataset size is way smaller than > what's allocated for the buffer cache, so you might want to reduce the > buffer cache budget. That should give you more than enough memory to > execute on 39 cores. > > Cheers, > Murtadha > > > On 01/29/2018, 3:30 AM, "Mike Carey" <[email protected]> wrote: > > + dev > > > On 1/28/18 3:37 PM, Rana Alotaibi wrote: > > Hi all, > > > > I would like to make AsterixDB utilizes all available CPU cores (39) > > that I have for the following query: > > > > USE mimiciii; > > SET `compiler.parallelism` "39"; > > SET `compiler.sortmemory` "128MB"; > > SET `compiler.joinmemory` "265MB"; > > SELECT P.SUBJECT_ID > > FROM LABITEMS I, PATIENTS P, P.ADMISSIONS A, A.LABEVENTS E > > WHERE E.ITEMID/*+bcast*/=I.ITEMID AND > > E.FLAG = 'abnormal' AND > > I.FLUID='Blood' AND > > I.LABEL='Haptoglobin' > > > > > > The total memory size that I have is 125GB(57GB for the AsterixDB > > buffer cache). By running the above query, I got the following error: > > > > "msg": "HYR0009: Job requirement (memory: 10705403904 bytes, CPU > > cores: 39) exceeds capacity (memory: 3258744832 <(325)%20874-4832> > bytes, CPU cores: 39)" > > > > How can I change this capacity default configuration? I'm looking > into > > this page : https://asterixdb.apache.org/docs/0.9.2/ncservice.html . > > Could you please point me to the appropriate configuration parameter? > > > > Thanks > > -- Rana > > > > > > > > > > > > > >
