Nick Reich commented on GEODE-3300:

[~palvarado], parallel export is accomplished by directing each member of the 
cluster to write to disk the data for which it is currently the primary (n.b. 
parallel export only functions for partitioned regions). This results in 
several snapshot files (either spread across the local disks of each member or 
together in a single directory, if a network drive location was specified on 
export). The files can then be imported by having a single member import a 
whole directory of snapshot files or individually commanding each member to 
import file(s) local to itself (you could use this strategy effectively do a 
parallel import manually).

Parallel imports are still being investigated and performance will be tested. 
The problem with parallel imports is that it will work well if the snapshot 
files for each member are local and the members still maintain ownership of the 
same partitions they did when the import was taken. If a rebalance occurred, 
the number of members changed, or the data is being imported into a different 
cluster, there is a high probability that the local data being read by a member 
will not belong to it and need to be sent to a different member, greatly 
increasing network traffic. It is not clear _a priori_ if this would provide 
better performance, but it if does, parallel imports is the next logical step. 

As for bulk loading performance in general, that depends on if you are trying 
to add data to an existing (and populated) cluster or bootstrapping a new one. 
If using persistence, backups provide a much faster mechanism for bootstrapping 
a new cluster (as long as the cluster has an equal or greater number of 
members, though performance is greatly improved when size is the same). For 
adding data to an existing and populated region, putAll() and snapshot imports 
are the best tools available.

> Complete and expose parallel snapshots feature
> ----------------------------------------------
>                 Key: GEODE-3300
>                 URL: https://issues.apache.org/jira/browse/GEODE-3300
>             Project: Geode
>          Issue Type: Sub-task
>          Components: docs, snapshot
>            Reporter: Nick Reich
>            Assignee: Nick Reich
> The parallel snapshots feature was never fully completed and exposed in the 
> API for snapshots. This is the first step in allowing users to make use of 
> this feature

This message was sent by Atlassian JIRA

Reply via email to