>--- Forwarded mail from [EMAIL PROTECTED] >Greg writes: >> CVS is designed _only_ for tracking changes in >> human written text files.
>Paul writes: >> Keep in mind also that there's a difference >> between "binary files" and "mergeable files". >> The two concepts are in fact orthogonal; there >> are mergeable binary types (given a suitable >> tool), and there are unmergeable text types. CVS >> is bad at storing unmergeable files, no matter >> whether or not they're binary files. CVS can be >> easily modified to support mergeable binary >> types, as I've demonstrated, without significant >> impact to its design. >In my view, CVS was designed to add a model of >concurrent modification and automatic merges on >top of the previously existing Revision Control >System representation of files. The removal of >exclusive locking for changes is the fundamental >reason that CVS exists. >It may be that the diff3 algorithm is not always >the best one suited to do such mergers.=20 Well said. However, I'm not 100% convinced that "automatic merges" is a prerequisite of the concurrency model. I see no reason why a command like "cvs update" could not spawn a graphical merge tool (even for C source code, for example). However, such actions, should they stall, must not stop others from doing their own merges or from committing new changes. >For example, using a UTF16 character set in a file >for example may prove to be difficult to merge >even if the text in the file is only a "simple" >Chinese representation. Perhaps something like >the xcin project will eventually provide a diff3 >for use in this case. >It may be desirable to mark UTF8 or UTF16 files as >being 'binary' in order to preserve the text more >exactly across operating systems that are not >(yet) friendly to such text. >For this reason, I take Paul's side on the issue >of the orthogonal nature of the discussion of >files that may or may not be "merged" using >automatic tooling of some sort. Thanks! :-) >I also share Greg's bias that using CVS to save >arbitrary binary data and/or derived objects is >not something that is a core competence of CVS. Saving derived objects is definitely not a best practice in SCM, at least not in the source control system. Whether or not arbitrary (or opaque) binary data should or should not be stored in CVS is a sticky question, because it may very well be source code (i.e. data that can be created or modified only by human intervention), in which case I believe it should be stored in the source control system. For merges, opaque data must be handled appropriately. One way is to take Greg's approach and boot it out completely. I believe a better way is to apply a simple selection tool that takes the place of a merge tool. (After all, any data type is mergeable if you can swap out the entire contents of a file in one chunk, right? :-) >For myself, I have no objection to a few small >icons being checked into a repository that will >also be holding sources that use them (of course, >I would usually favor them being convereted into a >text representation such as xbm format or the >like). I have seen where using very large binary >objects can cause problems for both users and >administrators. It's important to note that xbm format is also an unmergeable data type, at least with diff3, even if such files do not contain non-printable ASCII characters. The reason is that it's hard to edit an image without seeing it as an image. I agree about storing large binary files in CVS; it would be nice if there were multiple storage managers to choose from, depending on their suitability to the data at hand. But given that RCS works (though admittedly not necessarily well) in all cases, it's good enough (for 95%+ of the files thrown at it) that I don't see a reason to change at this moment. ('Course, I'd be happy to participate in a separate discussion about creating an abstraction layer over RCS and plugging in other storage managers... :-) >I have also seen problems where folks checkin >derived objects such as PostScript files that are >pure text files, but normally are not merged >effectively by a diff3 program during a normal >'cvs update' of a file. >I believe that adding flexibility to CVS as to >what program should be used in place of diff3 for >doing a merge operation is desirable. >That said, I do not know the correct approach to >take for allowing the cvs admin or user do such a >merge with a non-diff3 tool. Some such tools are >(by their nature) interactive and this does not >seem to be a good fit with the CVS methodology. I believe that the data type should be stored in a newphrase in the admin section of the RCS file. The bad thing about that is that if the RCS file is recycled with a new data type, or if it contains different data types on different branches, there is no correct value for the newphrase. Others have stated that the data type should be stored with each version of the file. That way you can tell when a nonsensical merge is attempted. But then the data type must be accurately maintained with every commit. Another way is to have the merge tool analyze the data types of all of the contributors, and fail if they're not all the same (or at least are not compatible given the semantics of a content merge). There are ways to address this in the general case, but they involve very intrusive changes to the CVS design. (The bottom line here is to decouple the data from its path in the workspace, which means a new method of mapping RCS files to working copies is needed. Having done this, you can guarantee that every revision stored in any RCS file contain the same data type.) >Some such programs may only be available on client >machines while others would potentially be >available on the server. I typically favor that >such programs would be consdiered to be present on >the server and NOT on the client. Resources that maintain the integrity of the repository and enforce process must necessarily be on the server. The *info scripts, for example, fall under this category. However, merge tools, like the tool used to edit commit messages, should be configurable by the user on the client side. Allowing the user to choose his favorite tool can do nothing but improve his productivity. >The exact semantics and rules under which a >substitution for a different program than diff3 >could be used for a merge operation need to be >carefully considered before we jump into a change. No doubt about that. >I suspect that we would need to add a filetype >recognizer into cvs as a preliminary step to help >to classify the type of a file that is to be >merged (or added or imported for that matter) in >order to know which of the potentially large >number of three-way merge programs and scripts >should be used or considered for use during a >given cvs merge operation. There's also the question of _when_ to run the recognizer. Above I mentioned three distinct times when such a mechanism might be used: Add time, commit time, and merge time. Each has their advantages and disadvantages. I think one viable compromise given the current design would be to record an initial data type at add time and propagate it with every commit. The user would be allowed to override the datatype with every commit. If a dead file is resurrected, the old data type is remembered as a default. When a merge is done, the recorded data types of all of the contributors are compared and some suitable action is taken. Suitable actions might be a failure if the contributors are of different types, or to ignore the common ancestor (i.e. perform a 2-way merge rather than 3-way) if the ancestor differs from the contributors. Or perhaps a conversion to a universal format could be done (e.g. if ancestor is Word, and the contributors are RTF and HTML then they could be converted to a common format like XML) before the merge and then the result be saved in the expected format. >I do not consider filetypes driven by the name of >a file to be useful in such deliberations. Certainly not in the general case. Naming conventions might be adequate on a per-shop or per-project basis, and for some data types naming conventions can be very accurate. But I agree that a better method is needed because in the general case the success rate at guessing data types based on naming conventions alone is pretty low. If it weren't for the "cvs import" command, punting might be a possible solution: Just require the data type as input to the "cvs add" command. But if large numbers of files are to be added at once, something better is needed. Alternatives include a file(1)-like mechanism to analyze a file's content in addition to naming conventions, or requiring a list of path/datatype pairs as an argument. >If anyone has any suggestions or other patches >for this kind of feature, I would be interested >in hearing about them. I'm sure this discussion will be quite lively! :-) >--- End of forwarded message from [EMAIL PROTECTED] _______________________________________________ Info-cvs mailing list [EMAIL PROTECTED] http://lists.gnu.org/mailman/listinfo/info-cvs