On Wednesday 03 April 2002 10:10 am, Bill Gribble wrote: > On Wed, 2002-04-03 at 08:54, Derek Atkins wrote: > > Note that this will not only fail to do what you want, but could > > leave your data file unreadable and unusable. This is _EXACTLY_ the > > kind of thing that we DON'T want people to be doing! If you want to > > change your data you should use the application to do it. > > Extremely Strongly Disagree. > > I think it's a fundamental part of the Unix and free software > philosophy that the data belongs to the user, not to the application. > "It's none of your d**n business what I do with my data!" If the user > wants to pipe their data through perl or sed or whatnot that's their > business. > > That's the main reason *I* wanted to go to the XML format to start > with. People *hate* applications that bottle their data up in opaque > formats. Databases get a special exemption because of the extremely > delicate nature of the interrelationships between bits of data, but all > real dbs have a way to dump text (SQL) that can be used to exactly > restore the db. Not just a text "export" (which is usually lossy) but > a dump which exposes all of the data's guts. > > Sure, it's ill-advised to make precipitous changes to your XML data > file, but it's also ill-advised to make precipitous changes to the > kernel source code... does that mean it shouldn't be available for easy > editing?
I usually just read these and move on, but these statements are a little too much to pass up. The core concern in all of data processing is the sanctity of the data. The data must be correct and accessible. Period. All the rest is decades of learning how to achieve this goal. Transaction logging, backups, audit trails, a way for easy and correct access, led to SQL and Relational databases, and I'm sure will lead further from there. 'Easy access' does not mean some user trying to read a big text file. And easy access does NOT mean easy to modify/delete data from outside of the applications that were created to do the job that requires the data in the first place. It means being able to read some portion of that data, maybe process it in some adhoc fashion, and then maybe saving it to another file or spreadsheet or whatever - NOT back into the database. That is reserved for applications that have been specifically created for that purpose and thoroughly TESTED and verified to be correct (minus the hopefully few unfortunate bugs that always exist in anything people do). Otherwise, your data may (usually does in my 20 years data processing experience) become less and less useful as it aquires errors in consistency and/or outright invalid values because someone tried to 'fix' something they didn't really understand (or possibly did understand but didn't test). Yes, some people make changes that are correct (but everyone makes mistakes sometime). These same people could usually take the time to write and thoroughly test a script/program to do the same changes rather than making an adhoc, quickie change in a text editor. The data is not 'bottled' up because its in a database or binary file format - so long as the format is known (published). The spirit of Unix and Open Source is not 'data in text files'. It is not data in ANY particular format whatsoever. The 'open' refers to the data (and programs, of course) being accessible, in a known format, that anyone can access by writing a script/program/whathaveyou. The idea is still that the data be correct and, so, useful. And by the way, not all databases have 'text' dumps for back-ups or whatever, but instead backup to a binary format to save space. If you want to dump the data, you can always write a SQL routine (or some other language) to output any/all tables in text format, cvs, whatever you like. XML was created to go along in the new web/Internet world and be to data definition what html is to programs. Fast, easy to access, and portable across the Internet. But that's all its really meant to be - a way for disparate systems to exchange relatively small amounts of data (for e-commerce between businesses, etc). Its not meant to replace databases, with their millions (and more) rows of data. It doesn't encompass transaction logging, etc, and the data takes up lots more space than other formats (even with compression techniques, its a factor of 10-100 or more, going up quickly as the number of columns/fields increases, and the quantity of those that are numeric - to say nothing of problems with blob fields which are binay by definition). Not that we have this drastic a situation in gnucash with its relatively limited number of tables and data. It is also more open to formatting errors. Which brings up the application (gnucash here) messing up the data. Yes, it can happen that a bug causes a problem with a single row or something and it got by testing. But, I've never seen a case where any application actually made all the data unreadable - if it was in a database. The problem you had in gnucash was almost certainly caused by the XML engine, or gnucash's interface to it. Can't say as I've heard of the Oracle engine, or postgreSQL engine, trashing their databases. Nor is there the case of a single comma or misplaced character in the 'file' causing the 'engine' (XML parser) to not be able to read any of the data. It sounds like you're arguing to keep the XML because you can manually fix the problems that can only occur because its in a flat XML file in the first place - kind of circular logic. Bob _______________________________________________ gnucash-devel mailing list [EMAIL PROTECTED] http://www.gnucash.org/cgi-bin/mailman/listinfo/gnucash-devel
