[ 
https://issues.apache.org/jira/browse/USERGRID-788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Johnson updated USERGRID-788:
-----------------------------------
    Sprint: Usergrid 25  (was: Usergrid 26)

> Better use of multithreading via RxJava in ExportApp tool
> ---------------------------------------------------------
>
>                 Key: USERGRID-788
>                 URL: https://issues.apache.org/jira/browse/USERGRID-788
>             Project: Usergrid
>          Issue Type: Story
>            Reporter: David Johnson
>            Assignee: David Johnson
>
> The idea is to use multiple files to make the Migration tool export run 
> faster and to support entities with a huge number of connections. Here are 
> some questions to consider and a proposal.
> h3. Should application be saved as multiple files?
> One advantage of saving to multiple files is that we can use multiple threads 
> to write the files and that will make the export faster.  For example, we 
> could start a thread to write out each collection of an app as it's own file, 
> or set of files.
> h3. Should each collection be saved as multiple files?
> Each collection must be written out serially if we want to preserve order. If 
> that is the case, then saving each collection to multiple files won't help 
> much there.
> h3. Should connections be separated out from entities in collections?
> Currently, we write an entities connections right into the entity itself 
> inside. This will be a problem if we have entities with a huge number of 
> connections, it will cause entity size to bloat and could cause an import 
> program to fail.  Connections should be stored in a separate file.
> h3. Should result be one large file for the sake of convenience?
> We could concatenate the multiple files together, or use tar and gzip them at 
> the end of the process.
> h3. Multiple files proposal
> 1. Each collection will be written out to a set of files named like this:
>    {{<orgname>_<appname>_<collname>_collection_N.json}}
> 2. For each collection, outgoing connections will be written to a set of 
> files named like this:
>    {{<orgname>_<appname>_<collname>_connections.N.json}}
> Each connection will be a JSON object with fields: 
>    {{source, sourceType, target, targetType, targetType}}
> 3. A command-line parameter specifies max size of each output file.
> 4. Implementation should use a thread for each collection of an application. 
> Currently, we have only one write thread which limits our throughput.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to