[ 
https://issues.apache.org/jira/browse/COR-35?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14282084#comment-14282084
 ] 

Peter Kelly commented on COR-35:
--------------------------------

I have a working prototype of this algorithm in Python, but am still working 
out a few issues. Once I've solved these I will implement the algorithm based 
in C within DocFormats itself, and write a paper on it. Will write more soon; 
I'm mostly just putting this issue here to flag it's something that I'm working 
on and will have a fundamental impact on the API and the ease with which one 
can develop filters.

> Have put operation accept changes, not updated view
> ---------------------------------------------------
>
>                 Key: COR-35
>                 URL: https://issues.apache.org/jira/browse/COR-35
>             Project: Corinthia
>          Issue Type: Improvement
>          Components: DocFormats - API
>            Reporter: Peter Kelly
>            Assignee: Peter Kelly
>
> The three canonical operations for bidirectional transformations are:
> get(S) -> V
> put(S,V') -> S
> create(V) -> S
> where S is a "source", or "concrete" document (in our case a "third party" 
> file format like .docx, .odt, or .md) and V is a "view", or "abstract" 
> document (in our case, HTML). In the notation I've used, V is the original 
> view created from the file, and V' is a modified version of the view obtained 
> from the editor.
> The current implementation of put, at least in the Word filter (which is the 
> only one that implements it so far) does what effectively amounts to a 
> reconstruction of the original V produced during the get, combined with a 
> comparison of that V with V' (the edited version). This comparison is, 
> however, done in a rather ad-hoc and poorly-defined manner, and is something 
> that would need to be replicated by every other filter.
> Essentially, the put operation has two jobs: 1) determine the set of changes 
> that were made to the view - that is, D = diff(V,V'), and 2) apply those 
> changes to the source document - that is, patch(S,D). The patch operation 
> uses changes expressed in terms of the view's data model (HTML) to figure out 
> how it must update the concrete representation.
> To simplify filter implementation, we can separate out these two tasks, so 
> that only the latter has to be performed by the put operation in each filter. 
> This requires a diff algorithm that works on HTML files, and a ways of 
> representing the changes to the file. Once these exist, implementation of a 
> patch algorithm will be relatively straightforward - it just consists of 
> taking the original document V and executing the operations in sequence to 
> produce S'. Thus, the relationship between the functions are as follows, 
> assuming V is the original view, V' is the modified view produced by the 
> editor, and D is the diff, or rather a sequence of change operations which 
> mutate the DOM tree.
> diff(V,V') -> D
> patch(V,D) -> V'
> A line-by-line or similar linear diff is insufficient, as it does not take 
> into account the tree structure of the document. Whereas a line-by-line diff 
> consists of a set of insert/delete operations that work on a list/array, we 
> need an algorithm that produces a set of operations that work on a tree.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to