On Mon, Aug 15, 2005 at 09:21:54PM +0100, Ian Lynagh wrote: > On Fri, Aug 12, 2005 at 08:24:52AM -0400, David Roundy wrote: > > The new format will still have the issue, which is that *between* lines > > we need to include newlines, but *after* the last line we don't want a > > newline. So when applying, we need to behave differently, depending on > > whether our patch is modifying the end of the file. > > I'm not sure what you mean here. In my mind, the new hunk > > newhunk 8 -3 +6 > old > foo > new > wibble > > (or equivalent) means "Go to the 8th byte. The next 3 bytes will be > "foo". Remove them. Insert the 6 bytes "wibble".
Indeed, I was thinking confusedly. The tricky bit is to decide whether the newline after the last line of the patch is included in the contents of the patch itself. i.e. does hunk 1 +foo translate (using very liberal notation) to newhunk 1 "" "foo\n" or newhunk 1 "" "foo" I think it must translate to the former. But hunk 1 - +foo translates to newhunk 1 "" "foo" not to newhunk 1 "" "foo\n" > There are various details you can fiddle with, like whether we note how > many lines are added/removed, what the starting line number is etc > (which comes down to "is the extra complexity worth it for how important > it is for users") but I'd really like a format that is trivial to apply > and coalesce. If you leave out the constraint that the character preceding the hunk must be a '\n' and the hunk must either be followed by EOF or by '\n', what you've got is a character-hunk patch type, which is not a bad idea, but also doesn't commute the same way a hunk patch does (for example, a character-hunk patch could only commute with a replace patch if the first and last characters of both new and old hunks cannot be part of a token). I'd rather include the line-number information in the line based hunks. It's pretty cheap to compute, and allows them to commute easily with old-fashioned hunks. Actually, this information is also needed in order to commute two hunk patches, so it's definitely a good idea to include it (since we do an awful lot of commutation, and it's better to count the lines once and store them). Actually, a nice way to implement your efficient hunk-apply might be to allow conversion from a line-hunk patch to a character-hunk patch, and then we'd just use the character-hunk patch version of apply. That would put the "trickiness" into a line-to-character patch conversion, which seems to me like it might be easier to get right. The character-hunk patch itself is an interesting feature (and could be quite useful, especially for files like makefiles). I think I'd lean towards calling the new patch format "linehunk" (or "lines"?) and then we could naturally add a similarly formatted "charhunk" (or "chars"?). "Binary" patches could also be implemented as a special case of a "charhunk" patch... but it might be nice to have some annotation stored to indicate whether it's binary or not, since the patch contents should be invisible if it's binary, but shouldn't be invisible for a text charhunk. Jason wrote: > I'm probably the last person that should chime in since I've not followed > the complete discussion about the new hunk format (and if I'm preaching > to the choir then please just ignore this)... I hope whatever you decide > on allows for fseek() (or similar) to jump to the relevant parts of the > patch files. It sounds like it may be possible given the above format > (although the offset computation might be a bit hairy, and easily broken > if the formatting is tweaked). I just want to stress that I think this > is important for allowing large patches to be handled efficiently. I'm > just envisioning something where darcs can skip from one patch to the > next without reading anything inbetween from the file. I guess the next > step would be to make a tool (as part of darcs or external) that allows > upgrading from the old patch format? Right, that's the idea. We include the text as a raw sequence of characters and store the number of characters, so we can skip right over it. -- David Roundy http://www.darcs.net _______________________________________________ darcs-devel mailing list [email protected] http://www.abridgegame.org/cgi-bin/mailman/listinfo/darcs-devel
