[
https://issues.apache.org/jira/browse/ASTERIXDB-1777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Wenhai updated ASTERIXDB-1777:
------------------------------
Description:
Till now, we ensued that two cases should consider caching the frames in the
memory (to shoe up the pipeline scheduling) before we write (syncwrite) them
onto the Runfile:
1. Replicate: In parallel sort case, if we have massive memory, we should cache
the framework in the memory (rather than directly writing them onto the
Runfile) before forward them onto distributed range partitions.
2. ExternalSort: The current Sorter caches the frames by the constraint of
compiler.sortmemory in asterix-configuration.xml. In other words, we sort such
batch size of frames in one-shot. Actually, we can run faster if we configure
smaller sortmemory budget (in our memory-resident experiment, 64MB saves 20%
sort time as compared to that in 320MB), but the per-round sorted frames will
be written onto Runfile with 1:1 of the total data size before we actually
merge them. We can also consider this case similar to the above Replicate case.
Still we are thinking the general cases like the above ...
was:
Till now, we ensued that two cases should consider caching the frames in the
memory (to shoe up the pipeline scheduling) before we write (syncwrite) them
onto the Runfile:
1. Replicate: In parallel sort case, if we have massive memory, we should cache
the framework in the memory (rather than directly writing them onto the
Runfile) before forward them onto distributed range partitions.
2. ExternalSort: The current Sorter caches the frames by the constraint of
compiler.sortmemory in asterix-configuration.xml. In other words, we sort such
batch size of frames in one-shot. Actually, we can run faster if we configure
smaller sortmemory budget (in our memory-resident experiment, 64MB saves 20%
sort time as compared to that in 320MB), but the per-round sorted frames will
be written onto Runfile with 1:1 of the total data size. We can also consider
this case similar to the above Replicate case.
Still we are thinking the general cases like the above ...
> Budget does not consider the runfile frame that should be temporarily cached
> in massive memory.
> -----------------------------------------------------------------------------------------------
>
> Key: ASTERIXDB-1777
> URL: https://issues.apache.org/jira/browse/ASTERIXDB-1777
> Project: Apache AsterixDB
> Issue Type: Improvement
> Environment: MAC/Linux
> Reporter: Wenhai
> Assignee: Wenhai
>
> Till now, we ensued that two cases should consider caching the frames in the
> memory (to shoe up the pipeline scheduling) before we write (syncwrite) them
> onto the Runfile:
> 1. Replicate: In parallel sort case, if we have massive memory, we should
> cache the framework in the memory (rather than directly writing them onto the
> Runfile) before forward them onto distributed range partitions.
> 2. ExternalSort: The current Sorter caches the frames by the constraint of
> compiler.sortmemory in asterix-configuration.xml. In other words, we sort
> such batch size of frames in one-shot. Actually, we can run faster if we
> configure smaller sortmemory budget (in our memory-resident experiment, 64MB
> saves 20% sort time as compared to that in 320MB), but the per-round sorted
> frames will be written onto Runfile with 1:1 of the total data size before we
> actually merge them. We can also consider this case similar to the above
> Replicate case.
> Still we are thinking the general cases like the above ...
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)