[ 
https://issues.apache.org/jira/browse/ASTERIXDB-1777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenhai updated ASTERIXDB-1777:
------------------------------
    Description: 
Till now, we ensued that two cases should consider caching the frames in the 
memory (to shoe up the pipeline scheduling) before we write (syncwrite) them 
onto the Runfile:
1. Replicate: In parallel sort case, if we have massive memory, we should cache 
the framework in the memory before forward them onto distributed range 
partitions.
2. ExternalSort: The current Sorter caches the frames by the constraint of 
compiler.sortmemory in asterix-configuration.xml. In other words, we sort such 
batch size of frames in one-shot. Actually, we can run faster if we configure 
smaller sortmemory budget (in our memory-resident experiment, 64MB saves 20% 
sort time as compared to that in 320MB), but the per-round sorted frames will 
be write onto Runfile with 1:1 of the total data size. We can also consider 
this case similar to the above Replicate case.
Still we are thinking the general cases like the above ...

  was:
Till now, we ensued that two cases should consider cache the frame in the 
memory before we write (syncwrite) them onto the Runfile:
1. Replicate: In parallel sort case, if we have massive memory, we should cache 
the framework in the memory before forward them onto distributed range 
partitions.
2. ExternalSort: The current Sorter caches the frames by the constraint of 
compiler.sortmemory in asterix-configuration.xml. In other words, we sort such 
batch size of frames in one-shot. Actually, we can run faster if we configure 
smaller sortmemory budget (in our memory-resident experiment, 64MB saves 20% 
sort time as compared to that in 320MB), but the per-round sorted frames will 
be write onto Runfile with 1:1 of the total data size. We can also consider 
this case similar to the above Replicate case.
Still we are thinking the general cases like the above ...


> Budget does not consider the runfile frame that should be temporarily cached 
> in massive memory.
> -----------------------------------------------------------------------------------------------
>
>                 Key: ASTERIXDB-1777
>                 URL: https://issues.apache.org/jira/browse/ASTERIXDB-1777
>             Project: Apache AsterixDB
>          Issue Type: Improvement
>         Environment: MAC/Linux
>            Reporter: Wenhai
>            Assignee: Wenhai
>
> Till now, we ensued that two cases should consider caching the frames in the 
> memory (to shoe up the pipeline scheduling) before we write (syncwrite) them 
> onto the Runfile:
> 1. Replicate: In parallel sort case, if we have massive memory, we should 
> cache the framework in the memory before forward them onto distributed range 
> partitions.
> 2. ExternalSort: The current Sorter caches the frames by the constraint of 
> compiler.sortmemory in asterix-configuration.xml. In other words, we sort 
> such batch size of frames in one-shot. Actually, we can run faster if we 
> configure smaller sortmemory budget (in our memory-resident experiment, 64MB 
> saves 20% sort time as compared to that in 320MB), but the per-round sorted 
> frames will be write onto Runfile with 1:1 of the total data size. We can 
> also consider this case similar to the above Replicate case.
> Still we are thinking the general cases like the above ...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to