[ 
https://issues.apache.org/jira/browse/ASTERIXDB-1777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenhai updated ASTERIXDB-1777:
------------------------------
    Description: 
Till now, we ensued that two cases should consider caching the frames in the 
memory (to shoe up the pipeline scheduling) before we write (syncwrite) them 
onto the Runfile:
1. Replicate: In parallel sort case, if we have massive memory, we should cache 
the framework in the memory (rather than directly writing them onto the 
Runfile) before forward them onto distributed range partitions.
2. ExternalSort: The current Sorter caches the frames by the constraint of 
compiler.sortmemory in asterix-configuration.xml. In other words, we sort such 
batch size of frames in one-shot. Actually, we can run faster if we configure 
smaller sortmemory budget (in our memory-resident experiment, 64MB saves 20% 
sort time as compared to that in 320MB), but the per-round sorted frames will 
be written onto Runfile with 1:1 of the total data size before we actually 
merge them. We can also consider this case similar to the above Replicate case.
Still we are thinking the general cases like the above ...

  was:
Till now, we ensued that two cases should consider caching the frames in the 
memory (to shoe up the pipeline scheduling) before we write (syncwrite) them 
onto the Runfile:
1. Replicate: In parallel sort case, if we have massive memory, we should cache 
the framework in the memory (rather than directly writing them onto the 
Runfile) before forward them onto distributed range partitions.
2. ExternalSort: The current Sorter caches the frames by the constraint of 
compiler.sortmemory in asterix-configuration.xml. In other words, we sort such 
batch size of frames in one-shot. Actually, we can run faster if we configure 
smaller sortmemory budget (in our memory-resident experiment, 64MB saves 20% 
sort time as compared to that in 320MB), but the per-round sorted frames will 
be written onto Runfile with 1:1 of the total data size. We can also consider 
this case similar to the above Replicate case.
Still we are thinking the general cases like the above ...


> Budget does not consider the runfile frame that should be temporarily cached 
> in massive memory.
> -----------------------------------------------------------------------------------------------
>
>                 Key: ASTERIXDB-1777
>                 URL: https://issues.apache.org/jira/browse/ASTERIXDB-1777
>             Project: Apache AsterixDB
>          Issue Type: Improvement
>         Environment: MAC/Linux
>            Reporter: Wenhai
>            Assignee: Wenhai
>
> Till now, we ensued that two cases should consider caching the frames in the 
> memory (to shoe up the pipeline scheduling) before we write (syncwrite) them 
> onto the Runfile:
> 1. Replicate: In parallel sort case, if we have massive memory, we should 
> cache the framework in the memory (rather than directly writing them onto the 
> Runfile) before forward them onto distributed range partitions.
> 2. ExternalSort: The current Sorter caches the frames by the constraint of 
> compiler.sortmemory in asterix-configuration.xml. In other words, we sort 
> such batch size of frames in one-shot. Actually, we can run faster if we 
> configure smaller sortmemory budget (in our memory-resident experiment, 64MB 
> saves 20% sort time as compared to that in 320MB), but the per-round sorted 
> frames will be written onto Runfile with 1:1 of the total data size before we 
> actually merge them. We can also consider this case similar to the above 
> Replicate case.
> Still we are thinking the general cases like the above ...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to