Nick Reich commented on GEODE-3300:
[~palvarado], parallel export is accomplished by directing each member of the
cluster to write to disk the data for which it is currently the primary (n.b.
parallel export only functions for partitioned regions). This results in
several snapshot files (either spread across the local disks of each member or
together in a single directory, if a network drive location was specified on
export). The files can then be imported by having a single member import a
whole directory of snapshot files or individually commanding each member to
import file(s) local to itself (you could use this strategy effectively do a
parallel import manually).
Parallel imports are still being investigated and performance will be tested.
The problem with parallel imports is that it will work well if the snapshot
files for each member are local and the members still maintain ownership of the
same partitions they did when the import was taken. If a rebalance occurred,
the number of members changed, or the data is being imported into a different
cluster, there is a high probability that the local data being read by a member
will not belong to it and need to be sent to a different member, greatly
increasing network traffic. It is not clear _a priori_ if this would provide
better performance, but it if does, parallel imports is the next logical step.
As for bulk loading performance in general, that depends on if you are trying
to add data to an existing (and populated) cluster or bootstrapping a new one.
If using persistence, backups provide a much faster mechanism for bootstrapping
a new cluster (as long as the cluster has an equal or greater number of
members, though performance is greatly improved when size is the same). For
adding data to an existing and populated region, putAll() and snapshot imports
are the best tools available.
> Complete and expose parallel snapshots feature
> Key: GEODE-3300
> URL: https://issues.apache.org/jira/browse/GEODE-3300
> Project: Geode
> Issue Type: Sub-task
> Components: docs, snapshot
> Reporter: Nick Reich
> Assignee: Nick Reich
> The parallel snapshots feature was never fully completed and exposed in the
> API for snapshots. This is the first step in allowing users to make use of
> this feature
This message was sent by Atlassian JIRA