On Sun, Jul 3, 2011 at 11:28 PM, Luke <[email protected]> wrote:
> I was expecting that, yes. However, I shouldn't. My recent experience is > with reasonably disciplined corporate users, who either get files from > sources with likely to be unique names (some form of the vendor name and > vendor's ID), create files for customers/vendors with names of the same > type, or are good at storing files with rather long, descriptive, and > accidentally unique names. Are we only worried about corporate users here? What about small business owners? > > However, if we combine our two ways of looking at this, I think we have > the solution. > > If you store files by ID, and internally reference them by ID at all > times, they can all be called "foobar.pdf" and it doesn't matter. Ok, so which foobar.pdf attached to a sales order is which one? i'd rather have some sort of unique constraint on the file name even if it isn't global. My current thinking is a uniqueness between the primary attachment point and the file name. So you can only have one "contract.doc" attached to a given sales order. However, if you have a secondary link there (to be implemented in the future) to the sales invoice, the user could see "contract.doc" linked from "sales order 12345." There could only be one "contract.doc" linked from that sales order to that invoice though there could be additional contract.doc's from other sales orders onto the same invoice. > > When a new file is uploaded, compare its CRC/checksum to the index of > stored files. If there's no match, it's a new file, gets a new ID, and we > save it. If the checksum matches, compare the contents to the N files with > the match. If one of them matches, create a link to the existing copy, by > inserting a referential entry in whatever table is tracking where files > are attached. If none of them matches, it's a new file. I am still worried about handling naming collisions in a way that is user-transparent. > > I'm pushing this, because I think it's more extendable, and it also leads > directly to what Erik wanted. > > If you divorce the storage of files, and the way they are tracked, from > the documents to which they are attached, you get a true virtual > filesystem. If you are going to do a true virtual filesystem I think you'd have to store a lot more information about each file to avoid the file collision issue. I personally would get cranky if I had to look through 34 foobar.pdf's to figure out which one was the one I wanted. Traditional filesystems get around this problem by requiring unique paths. We could do the same here but either the paths end up meaningless or they end up impossible to enforce against different database relations. So the question I have is that if we go this way, how do we handle that? Over what domains do we require unique file names, and how do we expose this to the user? > Any document can point to any file(s), and any file can be pointed to by > one/some/no documents. > > Associations can be re-mapped after file storage (this assumes a file > management UI at some point), which is necessary for Erik's suggestion. The second addition I was thinking about a different way of attaching secondary linking. CREATE TABLE secondary_file_mapping AS ( file_id int not null, ref_key int not null attach_class int not null references file_class(id) ); CREATE TABLE attach_tx_file_to_oe AS ( FOREIGN KEY file_id REFERENCES file_transaction(id), FOREIGN KEY ref_key REFERENCES oe(id), CHECK (attach_class = 2) ) INHERITS (secondary_file_mapping); CREATE TABLE attach_oe_file_to_tx AS ( FOREIGN KEY file_id REFERENCES file_order(id), FOREIGN KEY ref_key REFERENCES transaction(id), CHECK(attach_class = 1) ) INHERITS (secondary_file_mapping); CREATE TABLE attach_oe_file_to_oe AS ( FOREIGN KEY file_id REFERENCES file_order(id), FOREIGN KEY ref_key REFERENCES oe(id), CHECK (attach_class = 2) ) INHERITS (secondary_file_mapping); -- for order consolidation, etc This gets more complicated than your solution, Luke and that's a legitimate question of tradeoffs. But I am wondering if a real virtual filesystem is desired if it would be better to use a mature document management solution instead and use the proposed x-url approach. Note here you can get all files by "select * from file_attachment" and all secondary links by "select * from secondary_file_mapping." I don't currently see a way around the naming collision issues across wide ranges of users which strike me as very manageable, but I'd be interested in hearing suggestions. Best Wishes, Chris Travers ------------------------------------------------------------------------------ All of the data generated in your IT infrastructure is seriously valuable. Why? It contains a definitive record of application performance, security threats, fraudulent activity, and more. Splunk takes this data and makes sense of it. IT sense. And common sense. http://p.sf.net/sfu/splunk-d2d-c2 _______________________________________________ Ledger-smb-devel mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/ledger-smb-devel
