[jira] [Comment Edited] (JCRVLT-830) "incorrect aggregatoin" can lead to the deletion of content

Konrad Windszus (Jira) Tue, 06 Jan 2026 07:58:04 -0800


    [ 
https://issues.apache.org/jira/browse/JCRVLT-830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18050173#comment-18050173
 ]


Konrad Windszus edited comment on JCRVLT-830 at 1/6/26 3:57 PM:
----------------------------------------------------------------

[~joerghoh] the DocViewImporter is always a full coverage aggregate, it must 
contain all levels below its root node. Just checking the direct children is 
not enough, as 
[Node.remove()|https://github.com/apache/jackrabbit-filevault/blob/133e50f378737e33ff99ea29c7e742c4433e2f61/vault-core/src/main/java/org/apache/jackrabbit/vault/fs/impl/io/DocViewImporter.java#L498]
 will remove the whole tree (include deeply nested children which may be 
excluded).
However fully implementing the include/exclude option for the DocViewImporter 
might also require other sections in code to be adjusted. Seems that originally 
include/exclude was mainly thought for creating packages and not fully 
implemented for importing packages.

An option would be to instead of just deleting the node, collect all existing 
nodes which are
1. covered in the filter rules (including all children/properties)
2. not contained in the package
and just remove those.

However this requires quite some refactoring. The first thing I would start 
with is a method on {{org.apache.jackrabbit.vault.fs.api.WorkspaceFilter}} 
which should only return true a given subtree is supposed to be fully 
overwritten (i.e. mode=REPLACE and all nodes/properties below are included). 
That could be leveraged from {{DocViewImporter}} to only conditionally remove 
the node in case this methods returns true.

Another aspect is how to deal with ancestor nodes which are covered by filter 
rules but contain (deep) child nodes which shouldn't be touched. That requires 
more granular cleanup (i.e. removing individual properties and child nodes).


was (Author: kwin):
[~joerghoh] the DocViewImporter is always a full coverage aggregate, it must 
contain all levels below its root node. Just checking the direct children is 
not enough, as 
[Node.remove()|https://github.com/apache/jackrabbit-filevault/blob/133e50f378737e33ff99ea29c7e742c4433e2f61/vault-core/src/main/java/org/apache/jackrabbit/vault/fs/impl/io/DocViewImporter.java#L498]
 will remove the whole tree (include deeply nested children which may be 
excluded).
However fully implementing the include/exclude option for the DocViewImporter 
might also require other sections in code to be adjusted. Seems that originally 
include/exclude was mainly thought for creating packages and not fully 
implemented for importing packages.

An option would be to instead of just deleting the node, collect all existing 
nodes which are
1. covered in the filter rules (including all children/properties)
2. not contained in the package
and just remove those.

However this requires quite some refactoring...

> "incorrect aggregatoin" can lead to the deletion of content
> -----------------------------------------------------------
>
>                 Key: JCRVLT-830
>                 URL: https://issues.apache.org/jira/browse/JCRVLT-830
>             Project: Jackrabbit FileVault
>          Issue Type: Bug
>    Affects Versions: 4.1.4
>            Reporter: Joerg Hoh
>            Priority: Major
>         Attachments: 
> dstrpck-1763695579150-81d5fe51-b017-4dc8-a882-d01da5a3c003-fails.zip, 
> dstrpck-1763695579150-81d5fe51-b017-4dc8-a882-d01da5a3c003-fixed.zip
>
>
> We have a situation, where content needed to be updated using filevault, but 
> it actually deleted content. We were able to reproduce this and found it to 
> be caused by the structure of the content package.
> Context: We have identified this issue with AEM CS in the context of 
> replication, which uses filevault to both create and import the content 
> package.
> How to reproduce:
> * Create a content structure {{/content/dam/qcom/content-fragments/en/test1}} 
> and create an additional sibling node {{"abc"}}. Use the nodetype 
> "sling:Folder" for these nodes.
> * Install the content package  
> [^dstrpck-1763695579150-81d5fe51-b017-4dc8-a882-d01da5a3c003-fails.zip] 
> (mode=REPLACE, which is the default with the AEM package manager, and also 
> used in our usecase). You will get this output from filevault:
> {noformat}
> Importing content... 
> - /
> - /content
> - /content/dam
> - /content/dam/qcom
> - /content/dam/qcom/content-fragments
> D /content/dam/qcom/content-fragments/en/test1
> A /content/dam/qcom/content-fragments/en/test1
> A /content/dam/qcom/content-fragments/en/test1/jcr:content
> saving approx 3 nodes... 
> {noformat}
> * Now when checking you will find that the node "abc" was deleted as well. 
> But given the overall structure of the content package this is not what we 
> have expected. It was our expectation that the node "abc" still exists.
> We dug deeper and experimented a bit. When you redo the same steps with the 
> package  
> [^dstrpck-1763695579150-81d5fe51-b017-4dc8-a882-d01da5a3c003-fixed.zip] the 
> node abc still exists after the import.
> The only difference between these content package is in the "fails" case the 
> structure in the content package looks like this:
> {noformat}
> jcr_root/content/dam/qcom/content-fragments/
> jcr_root/content/dam/qcom/content-fragments/en/
> jcr_root/content/dam/qcom/content-fragments/en/test1/
> jcr_root/content/dam/qcom/content-fragments/en/test1/.content.xml
> jcr_root/content/dam/qcom/content-fragments/en/test1/_jcr_content/
> jcr_root/content/dam/qcom/content-fragments/en/test1/_jcr_content/.content.xml
> jcr_root/content/dam/qcom/content-fragments/.content.xml
> {noformat}
> while in the "fixed" case it looks like this:
> {noformat}
> jcr_root/content/dam/qcom/content-fragments/
> jcr_root/content/dam/qcom/content-fragments/en/
> jcr_root/content/dam/qcom/content-fragments/en/test1/
> jcr_root/content/dam/qcom/content-fragments/en/test1/.content.xml
> jcr_root/content/dam/qcom/content-fragments/en/test1/_jcr_content/
> jcr_root/content/dam/qcom/content-fragments/en/test1/_jcr_content/.content.xml
> jcr_root/content/dam/qcom/content-fragments/.content.xml
> jcr_root/content/dam/qcom/content-fragments/en/.content.xml
> {noformat}
> we split {{jcr_root/content/dam/qcom/content-fragments/.content.xml}} into 
> the 2 files {{jcr_root/content/dam/qcom/content-fragments/.content.xml}} and 
> {{jcr_root/content/dam/qcom/content-fragments/en/.content.xml}}, all other 
> files are identical.
> Now the questions:
> * Is this expected behaviour? I am not sure, in my opinion the aggregation of 
> the node definitions into different files should not play a role here.
> * The "failed" file was created by filevault directly, and I would like to 
> understand how filevault does the decision to either consolidate multiple 
> nodes into a single .content.xml file or to split them, so each node gets its 
> own .content.xml.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Comment Edited] (JCRVLT-830) "incorrect aggregatoin" can lead to the deletion of content

Reply via email to