[
https://issues.apache.org/jira/browse/ASTERIXDB-1777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Wenhai updated ASTERIXDB-1777:
------------------------------
Description:
Till now, we ensued that two cases should consider caching the frames in the
memory (to shoe up the pipeline scheduling) before we write (syncwrite) them
onto the Runfile:
1. Replicate: In parallel sort case, if we have massive memory, we should cache
the framework in the memory before forward them onto distributed range
partitions.
2. ExternalSort: The current Sorter caches the frames by the constraint of
compiler.sortmemory in asterix-configuration.xml. In other words, we sort such
batch size of frames in one-shot. Actually, we can run faster if we configure
smaller sortmemory budget (in our memory-resident experiment, 64MB saves 20%
sort time as compared to that in 320MB), but the per-round sorted frames will
be write onto Runfile with 1:1 of the total data size. We can also consider
this case similar to the above Replicate case.
Still we are thinking the general cases like the above ...
was:
Till now, we ensued that two cases should consider cache the frame in the
memory before we write (syncwrite) them onto the Runfile:
1. Replicate: In parallel sort case, if we have massive memory, we should cache
the framework in the memory before forward them onto distributed range
partitions.
2. ExternalSort: The current Sorter caches the frames by the constraint of
compiler.sortmemory in asterix-configuration.xml. In other words, we sort such
batch size of frames in one-shot. Actually, we can run faster if we configure
smaller sortmemory budget (in our memory-resident experiment, 64MB saves 20%
sort time as compared to that in 320MB), but the per-round sorted frames will
be write onto Runfile with 1:1 of the total data size. We can also consider
this case similar to the above Replicate case.
Still we are thinking the general cases like the above ...
> Budget does not consider the runfile frame that should be temporarily cached
> in massive memory.
> -----------------------------------------------------------------------------------------------
>
> Key: ASTERIXDB-1777
> URL: https://issues.apache.org/jira/browse/ASTERIXDB-1777
> Project: Apache AsterixDB
> Issue Type: Improvement
> Environment: MAC/Linux
> Reporter: Wenhai
> Assignee: Wenhai
>
> Till now, we ensued that two cases should consider caching the frames in the
> memory (to shoe up the pipeline scheduling) before we write (syncwrite) them
> onto the Runfile:
> 1. Replicate: In parallel sort case, if we have massive memory, we should
> cache the framework in the memory before forward them onto distributed range
> partitions.
> 2. ExternalSort: The current Sorter caches the frames by the constraint of
> compiler.sortmemory in asterix-configuration.xml. In other words, we sort
> such batch size of frames in one-shot. Actually, we can run faster if we
> configure smaller sortmemory budget (in our memory-resident experiment, 64MB
> saves 20% sort time as compared to that in 320MB), but the per-round sorted
> frames will be write onto Runfile with 1:1 of the total data size. We can
> also consider this case similar to the above Replicate case.
> Still we are thinking the general cases like the above ...
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)