[ https://issues.apache.org/jira/browse/USERGRID-788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
David Johnson updated USERGRID-788: ----------------------------------- Sprint: Usergrid 25 (was: Usergrid 26) > Better use of multithreading via RxJava in ExportApp tool > --------------------------------------------------------- > > Key: USERGRID-788 > URL: https://issues.apache.org/jira/browse/USERGRID-788 > Project: Usergrid > Issue Type: Story > Reporter: David Johnson > Assignee: David Johnson > > The idea is to use multiple files to make the Migration tool export run > faster and to support entities with a huge number of connections. Here are > some questions to consider and a proposal. > h3. Should application be saved as multiple files? > One advantage of saving to multiple files is that we can use multiple threads > to write the files and that will make the export faster. For example, we > could start a thread to write out each collection of an app as it's own file, > or set of files. > h3. Should each collection be saved as multiple files? > Each collection must be written out serially if we want to preserve order. If > that is the case, then saving each collection to multiple files won't help > much there. > h3. Should connections be separated out from entities in collections? > Currently, we write an entities connections right into the entity itself > inside. This will be a problem if we have entities with a huge number of > connections, it will cause entity size to bloat and could cause an import > program to fail. Connections should be stored in a separate file. > h3. Should result be one large file for the sake of convenience? > We could concatenate the multiple files together, or use tar and gzip them at > the end of the process. > h3. Multiple files proposal > 1. Each collection will be written out to a set of files named like this: > {{<orgname>_<appname>_<collname>_collection_N.json}} > 2. For each collection, outgoing connections will be written to a set of > files named like this: > {{<orgname>_<appname>_<collname>_connections.N.json}} > Each connection will be a JSON object with fields: > {{source, sourceType, target, targetType, targetType}} > 3. A command-line parameter specifies max size of each output file. > 4. Implementation should use a thread for each collection of an application. > Currently, we have only one write thread which limits our throughput. -- This message was sent by Atlassian JIRA (v6.3.4#6332)