[jira] [Comment Edited] (ASTERIXDB-1433) Multiple cores with huge memory slow down in the big fact table aggregation.

Wenhai (JIRA) Wed, 11 May 2016 20:08:03 -0700

    [ 
https://issues.apache.org/jira/browse/ASTERIXDB-1433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15281119#comment-15281119
 ]


Wenhai edited comment on ASTERIXDB-1433 at 5/12/16 3:06 AM:
------------------------------------------------------------

The schema is from an electronic company. Roughly speaking, we can abstract the 
schema as the following form
{noformat}
ConsumerType as open {
consumerid int64,
region int64, // The county code
starttime datetime, // the start time of the sampling about in between each hour
endtime datetime, // the end time
electronicdegree double, // the consumed degree
expense double, // the step tariff to be computed online. We can regard this as 
a normal double field in our setting
...., // Some business information
}
{noformat}
What they expect is just to aggregate on the expense/degree with a moderate 
selection rate following a goupby on the region or the month/year of the 
denoted datetime fields. Here, we suppose the memory is enough to accommodate 
the full table and the variant selection covers the domain of the relevant 
fields.


was (Author: lwhay):
The schema is from a electronic company. Roughly speaking, we can abstract the 
schema as the following form
{noformat}
ConsumerType as open {
consumerid int64,
region int64, // The county code
starttime datetime, // the start time of the sampling about in between each hour
endtime datetime, // the end time
electronicdegree double, // the consumed degree
expense double, // the step tariff to be computed online. We can regard this as 
a normal double field in our setting
...., // Some business information
}
{noformat}
What they expect is just to aggregate on the expense/degree with a moderate 
selection rate following a goupby on the region or the month/year of the 
denoted datetime fields. Here, we suppose the memory is enough to accommodate 
the full table and the variant selection covers the domain of the relevant 
fields.

> Multiple cores with huge memory slow down in the big fact table aggregation.
> ----------------------------------------------------------------------------
>
>                 Key: ASTERIXDB-1433
>                 URL: https://issues.apache.org/jira/browse/ASTERIXDB-1433
>             Project: Apache AsterixDB
>          Issue Type: Improvement
>          Components: Hyracks Core
>         Environment: 10 nodes X Linux ubuntu/6 cpu X 4 cores/per cpu, 128 GB 
> memory/per node.
>            Reporter: Wenhai
>
> This is a classic hardware platform that shoes up the TB scale of dataset in 
> total. AsterixDB does extremely well for the complex query that includes 
> multiple join operators over a high-selectivity select operator. However, the 
> running trace results demonstrate that, as compared to the big memory 
> configurations, the original tables is always re-loaded from the disk to the 
> actual memory even they have been handled in the latest query. To this end, 
> why not provide the strategy to keep the intermediate data of the last 
> completed query into the memory and free them in case the memory is not  
> enough for the newly query. In some case, the user will always trigger the 
> query with the different parameters on the same tables, for example, the 
> variant-parameter aggregation on the single big fact table.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (ASTERIXDB-1433) Multiple cores with huge memory slow down in the big fact table aggregation.

Reply via email to