Re: xpatch command-line interface (was: Binary Patch (xpatch))

Timofei Zhakov Mon, 16 Dec 2024 10:18:14 -0800

On Fri, Dec 13, 2024 at 11:01 PM Nathan Hartman <hartman.nat...@gmail.com>
wrote:


> On Fri, Dec 13, 2024 at 9:06 AM Timofei Zhakov <t...@chemodax.net> wrote:
> >
> > (snip)
> >
> >>
> >> I think this makes sense. That's what I wrote in a (rough draft) 'svn
> >> help patch' text for this:
> >>
> >> "  An xpatch file can be produced with the 'svn diff --xpatch'
> command.\n"
> >> "  This patch format is specific to Subversion and can represent all\n"
> >> "  types of working copy changes. It is applied with three-way
> merging\n"
> >> "  and conflict resolution.\n"
> >>
> >> > However, I'm changing my opinion about the CLI of xpatches'
> application/merging onto a WC. I realised that `svn merge --xpatch` is not
> the best and a little bit confusing for the users. In my opinion, the
> reason for that is that we should not rely the format detection onto the
> users. They should not even realise that they are going to apply an xpatch
> instead of a plaintext patch. So, I think that it's much simpler for the
> users to always invoke `svn patch`, for all kinds of patches, even if it
> would be a completely new format XYZ.
> >>
> >> This is my thinking as well. 'svn patch' has worked for unidiff and
> >> git's extended unidiff without needing the user to disambiguate, so
> >> from a user's perspective, it makes sense that it should work for
> >> xpatches also.
> >>
> >> > I also thought about how this would be implemented in the GUI
> clients, and didn't find anything better than automatic detection of the
> patch format, and utilising either unidif or xpatch applier.
> >> >
> >> > Finalising the points above, I'd suggest to do the xpatch application
> through the `svn patch patchfile.xpatch` command. I think this would be
> best for the users.
> >> >
> >> > However, this approach might cause a few technical problems; For
> example, how to detect the format?
> >>
> >> One issue is that currently, svn_diff_open_patch_file() opens a file
> >> (expected to be unidiff or git unidiff) and populates a
> >> svn_patch_file_t. This is passed to svn_diff_parse_next_patch(), which
> >> runs a state machine that automatically switches to git unidiff format
> >> if it sees lines only found in those. It's one implementation for both
> >> (somewhat related) formats. That's nice, but a XML-formatted patch
> >> would be too different to handle with the same parser.
> >>
> >> My initial idea was to insert a new initial state in the patch parsing
> >> state machine (transitions[] array in libsvn_diff/parse-diff.c) to
> >> look for an indication of xpatch format. Unfortunately, that would
> >> look for an exact match. What if there's an extra whitespace in the
> >> XML tag? Also, unfortunately, if it is detected, quite a lot of code
> >> churn would need to be done to divert processing to a different
> >> parser. I think it's ugly.
> >>
> >> My next thought was higher up in the call stack, in apply_patch() (in
> >> libsvn_client/patch.c). Here, svn_diff_open_patch_file() opens the
> >> file and populates a svn_patch_file_t. There could be a call added
> >> immediately after, to a function that probes the file for indication
> >> of xpatch. This would read the beginning of the file and look for the
> >> xpatch indication. If it's there, then the svn_patch_file_t gets
> >> transmuted into a new struct, such as svn_xpatch_file_t, and
> >> processing is diverted elsewhere, with the file still open. Otherwise,
> >> it's not xpatch, the file is rewound back to the beginning and the
> >> existing loop is allowed to run. Actually, we don't even need to
> >> rewind the file because one of the first things
> >> svn_diff_parse_next_patch() does is to fseek to
> >> patch_file->next_patch_offset. So in the unidiff case, we only need to
> >> make sure that this is 0. AFAICT this is always the case right now,
> >> since svn_diff_open_patch_file() already initializes it to 0, but I
> >> think the probing function should explicitly do it to future-proof the
> >> code.
> >>
> >> > I think that the xpatch files should begin with a line that describes
> it. For example, there is a standard tag in XML called "Processing
> Instruction" [1]. It specifies that below an XML stream is going. Usually
> it begins with a line like this:
> >> >
> >> > [[[
> >> > <?xml version="1.0" encoding="UTF-8" ?>
> >> > ]]]
> >> >
> >> > However, the wiki-page at [1], explains that it is allowed to use any
> tag as for the processing instruction. I think we may begin each xpatch
> with a line presented below:
> >> >
> >> > [[[
> >> > <?xml-patch encoding="UTF-8" version="42" ?>
> >> > ...or...
> >> > <?xpatch encoding="UTF-8" version="42" ?>
> >> > ]]]
> >> >
> >> > Here the 'encoding' attribute specifies the encoding used for the
> tags, and the 'version' will let us know whether we support it or not. We
> can add any other attribute, as needed.
> >> >
> >> > So to detect if it is an xpatch or not we can just compare the first
> line. Isn't it?
> >>
> >> I don't know; hopefully someone else here knows? Meanwhile, I'll try
> >> to research this.
> >>
> >> > The problem is that a unidiff patch can begin with it. For example,
> this is a good practice to write a log-message before the patch itself, and
> it should not be interpreted by the client as a content. This means that
> each plain and unidiff patch, beginning with this line, as a log-message,
> will be rejected by the client.
> >>
> >> I agree that beginning a log-message with this kind of text would be
> >> highly unusual but...
> >>
> >> > Probably we can deal with that and interpret those patches as
> 'xpatch', since it's very unusual to begin a log-message with this line,
> but we should take it in mind.
> >> >
> >> > Additionally, we may add an argument to the `svn patch` command, so
> users will be able to specify the kind explicitly, for example, `svn patch
> patchfile.patch --format unidiff` will apply it as a unidiff patch, even if
> it begins with the xpatch marker.
> >>
> >> ...an optional flag to explicitly specify the format could provide a
> >> workaround for that edge case. Also, it might be useful when used with
> >> scripting, if the script author knows that a patch must be a certain
> >> format and they want to reject all other formats.
> >>
> >> Now, unidiff can start with a log message, but can xpatch do that? I
> >> think the xml processing instruction must be first in the file. So,
> >> perhaps the log can be provided with some sort of tags within the xml
> >> structure? Maybe we provide an empty <log>...</log> that users can
> >> optionally fill in with their $EDITOR?
> >>
> >> > Thoughts?
> >>
> >> That's it for now... I'll try to find a definitive answer about the
> >> xml processing instruction.
> >>
> >> >
> >> > Thanks to Nathan for writing the "help" docstrings. They add much
> more sense to the feature in general. Are you going to describe xpatch in
> the `svn patch` command?
> >>
> >> That was the one I wrote already, but it's incomplete. I'll write the
> >> one for 'svn diff --xpatch' soon.
> >>
> >> > [1] https://en.wikipedia.org/wiki/Processing_Instruction
> >> >
> >> > --
> >> > Timofei Zhakov
> >>
> >> Cheers,
> >> Nathan
> >
> >
> > Hi Nathan,
> >
> > The thing is that xpatch format is going to be completely different from
> unidiff patch files, either plain Unix format or Git extended patches. The
> unidiff patches include only changes in each text/(sometimes binary) files,
> while xpatches should save the entire work of the svn_diff_tree_processor_t
> drivers.
> >
> > In other words, all unidiff patches and Git extended patches are both
> doing the same thing. They both are made to read the next patch
> (svn_patch_t) from svn_patch_file_t struct. Each patch file can be opened
> from an absolute path to it using svn_diff_open_patch_file(), and then
> closed using svn_diff_close_patch_file().
> >
> > The xpatch applyer may be implemented using an XML push-parser. It may
> work out the sequence like this:
> xpatch-file->svn_stream_t->expat->svn_diff_tree_processor_t->apply-processor.
> >
> > Note: since we need to actually "push" the content of the file to an XML
> parser, an svn_stream_copy3() function has to be invoked in order to wrap a
> file into a stream.
> >
> > Following from this, I think patch and xpatch are completely different,
> on the side of their implementations. I think the format detection has to
> be done before the client starts applying it. For example, cmdline client
> can open the patch file, when `svn patch` has been invoked, detect the
> format using a function such svn_client_patch_detect_format(apr_file_t
> *patch_file, svn_client_patch_kind_t *kind), then it will apply either
> patch or xpatch.
> >
> > However, I noticed a little problem in the implementation of patching;
> The svn_client_patch() function accesses the patchfile by an absolute path
> instead of a handle to a file. This follows that the file will be opened
> twice. Is it possible to modify it, so the svn_client_patch() function will
> accept apr_file_t or a svn_stream_t instead of an absolute path to this
> file?
> >
> > --
> > Timofei Zhakov
>
>
> I looked at the patches in the other thread [1] but so far I only had
> time to give them a cursory look. So some of my thoughts below might
> be totally off. Nevertheless... my initial thoughts:
>
> Yes, the patch API probably will need a new revision.
>
> However, before making the new revision, we should study the merge API
> because xpatches will trigger a merge behind the scenes, which in turn
> can trigger conflict resolution. If run from a GUI, the GUI will need
> to hook into the interactive conflict resolution.
>
> Also, we briefly spoke about a possible (optional) cmdline switch to
> tell the patch API the type of patch file. So the patch API needs a
> way to pass that information.
>
> Finally, I think it is better if the patch API continues to take an
> absolute path to the patch file, if possible.
>
> But, how to handle:
> * The totally different patch file formats (unidiff vs xpatch)
> * The need for different parsers and different code paths
> * Avoid opening the file twice (once to detect format, second time to
>   actually parse it)
>
> An idea that I tried to describe in my previous email -- I will try to
> explain it better here, but tell me if I'm missing something:
>
> First, apply_patches opens the file, but instead of using a struct
> svn_patch_file_t, it uses a plain FILE pointer, or perhaps a special
> struct.
>
> Then it calls a function that probes the contents of the patch file,
> until it recognizes something that indicates if this is unidiff or
> xpatch. (If the user told us what kind of file, we skip this step
> because we already know the file type.)
>
> If unidiff, it will create a struct svn_patch_file_t, store the FILE
> pointer there, ensure next_patch_offset is 0, and call the unidiff
> code, which is the loop that currently lives at apply_patch() which
> calls svn_diff_parse_next_patch().
>
> If xpatch, it will create a different struct that doesn't exist yet.
> Earlier I suggested a name like svn_xpatch_file_t but you can choose
> any suitable name. We need to convert a plain FILE pointer to
> svn_stream_t -- I am not sure if a function to do this exists already,
> so I need to look into this further; for some reason, I can't find the
> definition right now -- and then call the xpatch code.
>
> So, the file is only opened once, but we can call different parsers.
>
> Maybe the rest of apply_patch should be refactored to a new function,
> so apply_patch() would only decide the format and direct execution to
> one of the two alternatives.
>
> I have to run for a little while but I'll study the code more in-depth
> and try to provide better feedback soon.
>
> Meanwhile, just wanted to articulate those thoughts a little better...
>
> Hope this helps (let me know!),
> Nathan
>

Hi Nathan,

Thanks for clarifying your thoughts.

As I am understanding that, you are looking forward to make the xpatches
applying through the svn_client_patch() function, which is currently
working with unidiff and git style patches.

However, I think that this approach is completely incorrect in terms of the
abstraction. The svn_client_patch() function currently handles only
plain-text patches (including both git-style and unidiff). I am not sure,
but I think it reads each next patch one-by-one, detecting whether it's a
git-style patch or a plain undiff. It could do so, since they have just
minor differences, such as binary file support. Additionally, each unidiff
patch is valid in git-style as well.

I think that it would be better to separate format detection,
unidiff/git-style patch application, and xpatch application into different
API methods. Let me provide you with declaration of such functions, that
I've just wrote and explained with the corresponding docstrings:

[[[
/* Describes the patch format. */
typedef enum svn_client_patch_format_t
{
  /* Unrecognized or unknown format.
   *
   * ### Do we actually need this?
   */
  unknown = -1,

  /* Either unidiff or git style patch */
  unidiff = 0,

  /* Extended Subversion patch -- xpatch
   *
   * ### Better description?
   */
  xpatch,
};

/**
 * Apply an xpatch, described in readable @a xpatch_file stream, onto a
 * working at @a target_wcpath.
 *
 * Use @a pool for any temporary allocation.
 */
svn_error_t *
svn_client_xpatch_apply(svn_stream_t *xpatch_file,
                        const char *target_wcpath,
                        /* TODO: other arguments from the
                         * svn_client__apply_processor_create() function */
                        svn_client_ctx_t *ctx,
                        apr_pool_t *pool);

/**
 * Detect the format of @a patch_file, setting @a format with the detected
 * format.
 */
svn_error_t *
svn_client_detect_patch_format(apr_file_t *patch_file,
                               svn_client_patch_format_t *format);
]]]

Additionally, I think that it's much simpler to implement the xpatch
applier/parser in a separate function than the svn_client_patch(), as for
our client library (libsvn_client), probably in svn.exe, and may simplify
it a lot for the GUI clients developers, since they might want to detect
the format before to show different dialog for xpatch, as an example.

Moreover, there are semantic differences in patches and xpatches, because
unidiff patches internally work with the file pointer, using apr_file* API,
while xpatches could be implemented in a push-parser, backed by svn_stream
and Expat library.

Yeah, we could create a stream from a file, but in this case, it's better
to leave this responsibility to a higher level, for example, cmdline or any
other SVN client.

In summary, I'd say that we should separate the APIs for "merging" of
xpatches and undiff "patching", but leave a client function for the format
detection. However, the current patch API prohibits so, due to the source,
that we get the patches from.

Hope this now makes some sense...

-- 
Timofei Zhakov

Re: xpatch command-line interface (was: Binary Patch (xpatch))

Reply via email to