[ https://issues.apache.org/jira/browse/JCRVLT-374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16938866#comment-16938866 ]
Jörg Hoh commented on JCRVLT-374: --------------------------------- I don't think that exporting the VersionStorage via vault is a good at all, because you would need to export all the nodes referring to them as well; otherwise I don't see a chance to access them at all (not sure if the import works in that case as well). But if I just want to export 1m nodes (not the ones in /jcr:system/jcr:versionStore), I would expect that I run into a similar memory issue as well, though this is only an assumption. And to be honest, I haven't tested it at all, but in all my projects we normally refrain from creating too large packages because of memory issues. Would be great if we could improve it. > assembling a content-package consumes much memory > ------------------------------------------------- > > Key: JCRVLT-374 > URL: https://issues.apache.org/jira/browse/JCRVLT-374 > Project: Jackrabbit FileVault > Issue Type: Improvement > Components: Packaging > Affects Versions: 3.2.8 > Reporter: Jörg Hoh > Priority: Major > Attachments: JCRVLT-374-proto.patch, filevault.log.gz > > > I came across a situation that packaging a huge subtree > (/jcr:system/jcr:versionStorage) (bad idea, I know) caused a huge spike in > memory usage, which caused lots of FullGCs (due to AllocationFailures). > I have several stacktraces from that time, which all look very similar to > this one: > {noformat} > qtp1597826410-38130" prio=5 tid=0x94f2 nid=0xffffffff runnable > java.lang.Thread.State: RUNNABLE > at > org.apache.jackrabbit.oak.segment.SegmentNodeBuilder.createChildBuilder(SegmentNodeBuilder.java:147) > at > org.apache.jackrabbit.oak.plugins.memory.MemoryNodeBuilder.getChildNode(MemoryNodeBuilder.java:330) > at > org.apache.jackrabbit.oak.core.SecureNodeBuilder.<init>(SecureNodeBuilder.java:110) > at > org.apache.jackrabbit.oak.core.SecureNodeBuilder.getChildNode(SecureNodeBuilder.java:327) > at > org.apache.jackrabbit.oak.core.MutableTree.getTree(MutableTree.java:288) > at > org.apache.jackrabbit.oak.core.MutableRoot.getTree(MutableRoot.java:220) > at > org.apache.jackrabbit.oak.core.MutableRoot.getTree(MutableRoot.java:69) > at > org.apache.jackrabbit.oak.jcr.session.WorkspaceImpl$1.getTypes(WorkspaceImpl.java:85) > at > org.apache.jackrabbit.oak.plugins.nodetype.ReadOnlyNodeTypeManager.isNodeType(ReadOnlyNodeTypeManager.java:293) > at > org.apache.jackrabbit.oak.jcr.session.NodeImpl$24.perform(NodeImpl.java:931) > at > org.apache.jackrabbit.oak.jcr.session.NodeImpl$24.perform(NodeImpl.java:926) > at > org.apache.jackrabbit.oak.jcr.delegate.SessionDelegate.perform(SessionDelegate.java:207) > at > org.apache.jackrabbit.oak.jcr.session.ItemImpl.perform(ItemImpl.java:112) > at > org.apache.jackrabbit.oak.jcr.session.NodeImpl.isNodeType(NodeImpl.java:926) > at > org.apache.jackrabbit.vault.fs.impl.aggregator.FileAggregator.matches(FileAggregator.java:66) > at > org.apache.jackrabbit.vault.fs.impl.AggregatorProvider.getAggregator(AggregatorProvider.java:68) > at > org.apache.jackrabbit.vault.fs.impl.AggregateManagerImpl.getAggregator(AggregateManagerImpl.java:455) > at > org.apache.jackrabbit.vault.fs.impl.AggregateImpl.prepare(AggregateImpl.java:720) > at > org.apache.jackrabbit.vault.fs.impl.AggregateImpl.prepare(AggregateImpl.java:733) > at > org.apache.jackrabbit.vault.fs.impl.AggregateImpl.prepare(AggregateImpl.java:733) > at > org.apache.jackrabbit.vault.fs.impl.AggregateImpl.prepare(AggregateImpl.java:733) > at > org.apache.jackrabbit.vault.fs.impl.AggregateImpl.prepare(AggregateImpl.java:733) > at > org.apache.jackrabbit.vault.fs.impl.AggregateImpl.prepare(AggregateImpl.java:733) > at > org.apache.jackrabbit.vault.fs.impl.AggregateImpl.collect(AggregateImpl.java:684) > at > org.apache.jackrabbit.vault.fs.impl.AggregateImpl.prepare(AggregateImpl.java:747) > at > org.apache.jackrabbit.vault.fs.impl.AggregateImpl.load(AggregateImpl.java:657) > at > org.apache.jackrabbit.vault.fs.impl.AggregateImpl.getArtifacts(AggregateImpl.java:259) > at > org.apache.jackrabbit.vault.fs.impl.VaultFileImpl.<init>(VaultFileImpl.java:101) > at > org.apache.jackrabbit.vault.fs.impl.VaultFileSystemImpl.<init>(VaultFileSystemImpl.java:120) > at org.apache.jackrabbit.vault.fs.Mounter.mount(Mounter.java:64) > at > org.apache.jackrabbit.vault.packaging.impl.PackageManagerImpl.assemble(PackageManagerImpl.java:141) > at > org.apache.jackrabbit.vault.packaging.impl.PackageManagerImpl.assemble(PackageManagerImpl.java:102) > at > org.apache.jackrabbit.vault.packaging.impl.JcrPackageManagerImpl.assemble(JcrPackageManagerImpl.java:358) > at > org.apache.jackrabbit.vault.packaging.impl.JcrPackageManagerImpl.assemble(JcrPackageManagerImpl.java:324) > {noformat} > It seems to me that vault is traversing the complete tree and also storing > some information of every traversed node in memory. > For validation I enabled trace logging for {{org.apache.jackrabbit.vault.fs}} > and tried to reproduce locally to package the complete > {{/jcr:system/jcr:versionStorage}} in a package. > {noformat} > [...] > 19.09.2019 20:06:08.792 *TRACE* [qtp681943839-1771] > org.apache.jackrabbit.vault.fs.impl.AggregateImpl Create Aggregate /jcr:system > 19.09.2019 20:06:08.792 *TRACE* [qtp681943839-1771] > org.apache.jackrabbit.vault.fs.impl.AggregateImpl Collecting /jcr:system > 19.09.2019 20:06:08.792 *TRACE* [qtp681943839-1771] > org.apache.jackrabbit.vault.fs.impl.AggregateImpl descending into /jcr:system > (descend=false) > 19.09.2019 20:06:08.792 *TRACE* [qtp681943839-1771] > org.apache.jackrabbit.vault.fs.impl.AggregateImpl including /jcr:system -> > /jcr:system/jcr:primaryType > 19.09.2019 20:06:08.792 *TRACE* [qtp681943839-1771] > org.apache.jackrabbit.vault.fs.impl.AggregateImpl including /jcr:system -> > /jcr:system > 19.09.2019 20:06:08.792 *TRACE* [qtp681943839-1771] > org.apache.jackrabbit.vault.fs.impl.AggregateImpl including /jcr:system -> > /jcr:system/jcr:mixinTypes > 19.09.2019 20:06:08.792 *TRACE* [qtp681943839-1771] > org.apache.jackrabbit.vault.fs.impl.AggregateImpl including /jcr:system -> > /jcr:system/jcr:versionStorage > 19.09.2019 20:06:08.793 *TRACE* [qtp681943839-1771] > org.apache.jackrabbit.vault.fs.impl.AggregateImpl descending into > /jcr:system/jcr:versionStorage (descend=true) > 19.09.2019 20:06:08.793 *TRACE* [qtp681943839-1771] > org.apache.jackrabbit.vault.fs.impl.AggregateImpl including /jcr:system -> > /jcr:system/jcr:versionStorage/jcr:primaryType > 19.09.2019 20:06:08.793 *TRACE* [qtp681943839-1771] > org.apache.jackrabbit.vault.fs.impl.AggregateImpl including /jcr:system -> > /jcr:system/jcr:versionStorage/ee > 19.09.2019 20:06:08.793 *TRACE* [qtp681943839-1771] > org.apache.jackrabbit.vault.fs.impl.AggregateImpl descending into > /jcr:system/jcr:versionStorage/ee (descend=true) > 19.09.2019 20:06:08.793 *TRACE* [qtp681943839-1771] > org.apache.jackrabbit.vault.fs.impl.AggregateImpl including /jcr:system -> > /jcr:system/jcr:versionStorage/ee/jcr:primaryType > [...] > {noformat} > I found a lot of these "Including /jcr:system -> ..." statements in the log: > {noformat} > $ grep -c "AggregateImpl including" filevault.log > 174425 > $ > {noformat} > which is logged at [1]. And at [2] something is unconditionally added to a > global variable. And I think that this is the problematic piece. > I don't know the details of vault good enough to propose a solution, but I > would love to have a less memory-intensive algorithm, for which the > memory-usage does not grow linear with the number of nodes covered by the > package rules. > [1] > https://github.com/apache/jackrabbit-filevault/blob/jackrabbit-filevault-3.2.8/vault-core/src/main/java/org/apache/jackrabbit/vault/fs/impl/AggregateImpl.java#L502 > [2] > https://github.com/apache/jackrabbit-filevault/blob/jackrabbit-filevault-3.2.8/vault-core/src/main/java/org/apache/jackrabbit/vault/fs/impl/AggregateImpl.java#L507 -- This message was sent by Atlassian Jira (v8.3.4#803005)