[
https://issues.apache.org/jira/browse/USERGRID-788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
David Johnson updated USERGRID-788:
-----------------------------------
Sprint: Usergrid 25
> Better use of multithreading via RxJava in ExportApp tool
> ---------------------------------------------------------
>
> Key: USERGRID-788
> URL: https://issues.apache.org/jira/browse/USERGRID-788
> Project: Usergrid
> Issue Type: Story
> Reporter: David Johnson
> Assignee: David Johnson
>
> The idea is to use multiple files to make the Migration tool export run
> faster and to support entities with a huge number of connections. Here are
> some questions to consider and a proposal.
> h3. Should application be saved as multiple files?
> One advantage of saving to multiple files is that we can use multiple threads
> to write the files and that will make the export faster. For example, we
> could start a thread to write out each collection of an app as it's own file,
> or set of files.
> h3. Should each collection be saved as multiple files?
> Each collection must be written out serially if we want to preserve order. If
> that is the case, then saving each collection to multiple files won't help
> much there.
> h3. Should connections be separated out from entities in collections?
> Currently, we write an entities connections right into the entity itself
> inside. This will be a problem if we have entities with a huge number of
> connections, it will cause entity size to bloat and could cause an import
> program to fail. Connections should be stored in a separate file.
> h3. Should result be one large file for the sake of convenience?
> We could concatenate the multiple files together, or use tar and gzip them at
> the end of the process.
> h3. Multiple files proposal
> 1. Each collection will be written out to a set of files named like this:
> {{<orgname>_<appname>_<collname>_collection_N.json}}
> 2. For each collection, outgoing connections will be written to a set of
> files named like this:
> {{<orgname>_<appname>_<collname>_connections.N.json}}
> Each connection will be a JSON object with fields:
> {{source, sourceType, target, targetType, targetType}}
> 3. A command-line parameter specifies max size of each output file.
> 4. Implementation should use a thread for each collection of an application.
> Currently, we have only one write thread which limits our throughput.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)