On Oct 6, 2006, at 3:08 PM, Erik Trimble wrote:

First of all, let's agree that this discussion of File Versioning makes no more reference to its usage as Version Control. That is, we aren't going to talk about it being useful for source code, other than in the context where a source code file is a document, like any other text document. File Versioning and Version Control are separate things, with different purposes and feature sets.


OK. So, now we're on to FV. As Nico pointed out, FV is going to need a new API. Using the VMS convention of simply creating file names with a version string afterwards is unacceptible, as it creates enormous directory pollution,

Assumption, not supported.  "Eye of the  beholder."

not to mention user confusion.

Assumption, not supported.

So, FV has to be invisible to non-aware programs.

yes


Now we have a problem: how do we access FV for non-local (e.g. SAMBA/NFS) clients? Since the VAST majority of usefulness of FV is in the network file server arena,

Assumption, and definitely not supported. It is very useful outside of the file sharing arena.

unless we can use FV over the network, it is useless.

Wrong

You can't modify the SMB or NFS protocol (easily or quickly) to add FV functionality (look how hard it was to add ACLs to these protocols).

About the only way I can think around this problem is to store versions in a special subdir of each directory (e.g. .zfs_version), which would then be browsable over the network, using tools not normally FV-aware. But this puts us back into the problem of a directory which potentially has hundreds or thousands of files.

This directory way of doing it is not a good way. It fails the ease of use to the end user test.

The VMS way is far superior. The problem is that you have to make sure that apps that are not FV aware have no problems, which means you cannot just append something to the actual file name. It has to be some sort of meta data.


Also, "save-early-save-often" results in a version explosion, as does auto-save in the app.

Does not have to. In VMS it is configurable on how many versions you want to save before it does an auto purge. A simple purge command then cleans things up for you. Very minimal requirements for "retraining" the user. Set the default configuration to be a max of 1 version and you have no problems unless you turn it on.

While this may indeed mean that you have all of your changes around, figuring out which version has them can be massively time- consuming.

Your assumption.  (And much less hard than using snapshots).

Let's say you have auto-save set for 5 minutes (very common in MS Word). That gives you 12 versions per hour.

So?

If you suddenly decide you want to back up a couple of hours, that leaves you with looking at a whole bunch of files, trying to figure out which one you want. E.g. I want a file from about 3 hours ago. Do I want the one from 2:45, 2:50, 2:55, 3:00, 3:05, 3:10, or 3:15 hours ago?

Look at the file create time. Take a quick look at the contents if you are confused. At least you HAVE the capability to go back.

And, what if I've mis-remembered, and it really was closer to 4 hours ago?

Simple file system tools help me find it.

Yes, the data is eventually there. However, wouldn't a 1-hour snapshot capability have saved you an enormous amount of time,

No. Managing the versions is not hard like you say. I lived on VMS for years and it was never a problem. It is your mindset and your preconceived notions that is the problem

by being able to simplify your search (and, yes, you won't have _exactly_ the version you want, but odds are you will have something close, and you can put all the time you would have spent searching the FV tree into restarting work from the snapshot-ed version).

I would much rather take an extra 2 minutes futzing around with the FV saved versions than trying to recreate what I had done. And snapshots are not user friendly from a UI perspective -- funny strange directories and having to dig around in them.


Remember, FV's main audience is going to be "naive" users, not us technical users,

No, it is US technical users as much as the naive user.

who generally have the problem that FV solves under control (yes, FV would make it easier for us, but we're not the primary target).

We do? I have often edited system files and then wanted to go back to something I deleted earlier as I realized it was the wrong one.

Version explosion (and the consequential problem of picking the right version to edit) is a huge problem for the naive audience.


This statement is naive itself and is unsupportable. Where are the usability tests that support this? VMS has a LONG HISTORY and is/was used by a lot of what you call "naive" users. FV never caused any problems that I encountered or indeed that DEC encountered as it never once came up as a an issue with VMS usability.

Also, a big difference between Snapshots and FV tends to be who controls EOL-ing a version/Snapshot. Snapshots tend to be done by the Admin, and their aging strictly controlled and defines (e.g. "we keep hourly snapshots for 1 week"). File versioning is typically under the control of the End-User, as their utility is much more nebulously defined. Certainly, there is no ability to truncate based on number of versions (e.g. "we only allow 100 versions to be kept"), since the frequency of versioning a file varies widely. Aging on a version is possibly a better answer, but this runs into a problem of user education, where we have to retrain our users to stop making frequent copies of important documents (like they do now, in absence of FV), but _do_ remember to dig through the FV archive periodically to save a desirable old copy. Also, if managing FV is to be a User task, how are they to do it over NFS/SAMBA? And, "log into the NFS server to do a cleanup" isn't an acceptable answer.

Also, FV is only useful for apps which do a "close()" on a file (or at least, I'm assuming we wait for a file to signal that it is closed before taking a version - otherwise, we do what? take a version every X minutes while the file still open? I shudder to think about the implementation of this, and its implications...). How many apps keep a file open for a long period of time? FV isn't useful to them, only an "unlimited undo" functionality INSIDE the app.

Yes, any time you do a close() or equivalent. The idea is not to implement a universal undo stack.

You can always find a scenario where FV doesn't help. So what. There are lots of scenarios where it does help. More positive scenarios than you can dream up negatives for.


Lastly, consider the additional storage requirement of FV, and exactly how much utility you gain for sacrificing disk space.

We have GB and TB of cheap space. A few extra versions lying around until people hit their quotas is the users' issue, not the sysadmin.

Look at this scenario: I'm editing a file, making 1MB of change per 5 minutes (a likely scenario when actively editing any Office- style document), of which only 50% to I actually make permanent (the rest being temp edits for ideas I decide to change or throw out). If I'm auto-saving every 5 minutes, that means I use 12MB of version space per hour. If I took a hourly snapshot, then I need only 6MB of storage.

So. Your snapshot is much less useful and 12MB is nothing in todays GBs of cheap space. Probably compressed too so even less usage than you envision.

The situation gets worse, for the primary usefulness of FV is for files which are frequently edited - mean that they have rapid content change, and not in append-mode. Such a usage pattern means that FV will take up a much greater amount of space than periodic snapshots, as the longer interval in snapshots will allow the changes to "settle".

Not an issue.  Cheap disk space.



To me, FV is/was very useful in TOPS-20 and VMS, where you were looking at a system DESIGNED with the idea in mind, already have a user base trained to use and expect it, and virtually all usage was local (i.e. no network filesharing). None of this is true in the UNIX/POSIX world.

And does not affects its usefulness.

Chad



-Erik
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

---
Chad Leigh -- Shire.Net LLC
Your Web App and Email hosting provider
chad at shire.net



Attachment: smime.p7s
Description: S/MIME cryptographic signature

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to