[ https://issues.apache.org/jira/browse/ASTERIXDB-1433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15281114#comment-15281114 ]
Wenhai edited comment on ASTERIXDB-1433 at 5/12/16 2:17 AM: ------------------------------------------------------------ The IO statistics is from the iostat command which is on average at the speed of 160MB/s (with hot running) or 60MB/s (on code running). i.e., after we aggregating a 60GB table, the reloading time of another aggregation will consume at least 600s. Of course, we can question whether we configured so slow disk system, but we have a huge memory space which is not so much expensive. Best, Wenhai was (Author: lwhay): The IO statistics is from the iostat command which is on average at the speed of 160MB/s (with hot running) or 60MB/s (on code running). i.e., after we aggregating a 60GB table, the reloading time of another aggregation will consume at least 600s. Of course, we can question whether we configured so slow disk system, but we have a huge memory space which is not so much expensive. > Multiple cores with huge memory slow down in the big fact table aggregation. > ---------------------------------------------------------------------------- > > Key: ASTERIXDB-1433 > URL: https://issues.apache.org/jira/browse/ASTERIXDB-1433 > Project: Apache AsterixDB > Issue Type: Improvement > Components: Hyracks Core > Environment: 10 nodes X Linux ubuntu/6 cpu X 4 cores/per cpu, 128 GB > memory/per node. > Reporter: Wenhai > > This is a classic hardware platform that shoes up the TB scale of dataset in > total. AsterixDB does extremely well for the complex query that includes > multiple join operators over a high-selectivity select operator. However, the > running trace results demonstrate that, as compared to the big memory > configurations, the original tables is always re-loaded from the disk to the > actual memory even they have been handled in the latest query. To this end, > why not provide the strategy to keep the intermediate data of the last > completed query into the memory and free them in case the memory is not > enough for the newly query. In some case, the user will always trigger the > query with the different parameters on the same tables, for example, the > variant-parameter aggregation on the single big fact table. -- This message was sent by Atlassian JIRA (v6.3.4#6332)