[
https://issues.apache.org/jira/browse/JCRVLT-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18054567#comment-18054567
]
Julian Reschke edited comment on JCRVLT-831 at 1/27/26 10:12 AM:
-----------------------------------------------------------------
Attempt to summarize the situation:
1. There was a problem that FileVault had performance issues when it checked a
potentially huge folder (for nodes matching the filters). This was fixed with
JCRVLT-789 (Version 3.8.4), where we avoided that check when it was clear from
the filter config that the set of nodes already was known in advance.
2. We found that we only fixed one of two issues; the same problem was present
in the "namespace prefix scan" phase. We applied a similar fix for that case
(this ticket).
3. What's left is the case where FV needs to collect namespace prefixes from
all sibling nodes in a collection, because it is ordered and they are
serialized as empty elements (see
https://jackrabbit.apache.org/filevault/docview.html#Empty_Elements) (Note that
filevault only checks the primary node type - strictly speaking, that is not
correct, but apparently it's not a problem in practice)
4. In JCRVLT-836, we added a lot of DEBUG logging to make it easier to
understand what actually is happening, and how long it takes.
5. A simple change would be to always serialize the whole namespace registry.
That would avoid all scanning, but it would affect the size of the generated
XML. As compromise, that shortcut could be restricted to the case where we
actually *have* ordered collections.
cc: [~jhoh], [~kwin], [~cschneider], [~patlego].
was (Author: reschke):
Attempt to summarize the situation:
1. There was a problem that FileVault had performance issues when it checked a
potentially huge folder (for nodes matching the filters). This was fixed with
JCRVLT-789 (Version 3.8.4), where we avoided that check when it was clear from
the filter config that the set of nodes already was known in advance.
2. We found that we only fixed one of two issues; the same problem was present
in the "namespace prefix scan" phase. We applied a similar fix for that case
(this ticket).
3. What's left is the case where FV needs to collect namespace prefixes from
all sibling nodes in a collection, because it is ordered and they are
serialized as empty elements (see
https://jackrabbit.apache.org/filevault/docview.html#Empty_Elements) (Note that
filevault only checks the primary node type - strictly speaking, that is not
correct, but apparently it's not a problem in practice)
4. In JCRVLT-836, we added a lot of DEBUG logging to make it easier to
understand what acutally is happening, and how long it takes.
5. A simple change would be to always serialize the whole namespace registry.
That would avoid all scanning, but it would affect the size of the generated
XML. As compromise, that shortcut could be restricted to the case where we
actually *have* ordered collections.
cc: [~jhoh], [~kwin], [~cschneider], [~patlego].
> For collection of namespace prefixes, avoid iterating over sibling nodes not
> contained in the filter(s)
> -------------------------------------------------------------------------------------------------------
>
> Key: JCRVLT-831
> URL: https://issues.apache.org/jira/browse/JCRVLT-831
> Project: Jackrabbit FileVault
> Issue Type: Improvement
> Components: vlt
> Reporter: Julian Reschke
> Assignee: Julian Reschke
> Priority: Major
> Fix For: 4.2.0
>
>
> It appears that the changes for JCRVLT-789 improved the performance for
> exporting the *content*, however the phase of collecting namespace prefixes
> does not use that optimzation.
> (TBD: write a test)
--
This message was sent by Atlassian Jira
(v8.20.10#820010)