Re: XML size (was: no subject)

Bob Willan Wed, 03 Apr 2002 10:17:35 -0800

On Wednesday 03 April 2002 10:10 am, Bill Gribble wrote:
> On Wed, 2002-04-03 at 08:54, Derek Atkins wrote:
> > Note that this will not only fail to do what you want, but could
> > leave your data file unreadable and unusable.  This is _EXACTLY_ the
> > kind of thing that we DON'T want people to be doing!  If you want to
> > change your data you should use the application to do it.
>
> Extremely Strongly Disagree.
>
> I think it's a fundamental part of the Unix and free software
> philosophy that the data belongs to the user, not to the application. 
> "It's none of your d**n business what I do with my data!"  If the user
> wants to pipe their data through perl or sed or whatnot that's their
> business.
>
> That's the main reason *I* wanted to go to the XML format to start
> with.  People *hate* applications that bottle their data up in opaque
> formats.  Databases get a special exemption because of the extremely
> delicate nature of the interrelationships between bits of data, but all
> real dbs have a way to dump text (SQL) that can be used to exactly
> restore the db.  Not just a text "export" (which is usually lossy) but
> a dump which exposes all of the data's guts.
>
> Sure, it's ill-advised to make precipitous changes to your XML data
> file, but it's also ill-advised to make precipitous changes to the
> kernel source code... does that mean it shouldn't be available for easy
> editing?


I usually just read these and move on, but these statements are a little 
too much to pass up.

The core concern in all of data processing is the sanctity of the data.  
The data must be correct and accessible.  Period.  All the rest is 
decades of learning how to achieve this goal.  Transaction logging, 
backups, audit trails, a way for easy and correct access, led to SQL and 
Relational databases, and I'm sure will lead further from there.

'Easy access' does not mean some user trying to read a big text file.  
And easy access does NOT mean easy to modify/delete data from outside of 
the applications that were created to do the job that requires the data 
in the first place.  It means being able to read some portion of that 
data, maybe process it in some adhoc fashion, and then maybe saving it to 
another file or spreadsheet or whatever - NOT back into the database.  
That is reserved for applications that have been specifically created for 
that purpose and thoroughly TESTED and verified to be correct (minus the 
hopefully few unfortunate bugs that always exist in anything people do).  
Otherwise, your data may (usually does in my 20 years data processing 
experience) become less and less useful as it aquires errors in 
consistency and/or outright invalid values because someone tried to 'fix' 
something they didn't really understand (or possibly did understand but 
didn't test).  Yes, some people make changes that are correct (but 
everyone makes mistakes sometime).  These same people could usually take 
the time to write and thoroughly test a script/program to do the same 
changes rather than making an adhoc, quickie change in a text editor.

The data is not 'bottled' up because its in a database or binary file 
format - so long as the format is known (published).  The spirit of Unix 
and Open Source is not 'data in text files'.  It is not data in ANY 
particular format whatsoever.  The 'open' refers to the data (and 
programs, of course) being accessible, in a known format, that anyone can 
access by writing a script/program/whathaveyou.  The idea is still that 
the data be correct and, so, useful.  And by the way, not all databases 
have 'text' dumps for back-ups or whatever, but instead backup to a 
binary format to save space.  If you want to dump the data, you can 
always write a SQL routine (or some other language) to output any/all 
tables in text format, cvs, whatever you like.

XML was created to go along in the new web/Internet world and be to data 
definition what html is to programs.  Fast, easy to access, and portable 
across the Internet.  But that's all its really meant to be - a way for 
disparate systems to exchange relatively small amounts of data (for 
e-commerce between businesses, etc).  Its not meant to replace databases, 
with their millions (and more) rows of data.  It doesn't encompass 
transaction logging, etc, and the data takes up lots more space than 
other formats (even with compression techniques, its a factor of 10-100 
or more, going up quickly as the number of columns/fields increases, and 
the quantity of those that are numeric - to say nothing of problems with 
blob fields which are binay by definition).  Not that we have this 
drastic a situation in gnucash with its relatively limited number of 
tables and data.  It is also more open to formatting errors.

Which brings up the application (gnucash here) messing up the data.  Yes, 
it can happen that a bug causes a problem with a single row or something 
and it got by testing.  But, I've never seen a case where any application 
actually made all the data unreadable - if it was in a database.  The 
problem you had in gnucash was almost certainly caused by the XML engine, 
or gnucash's interface to it.  Can't say as I've heard of the Oracle 
engine, or postgreSQL engine, trashing their databases.  Nor is there the 
case of a single comma or misplaced character in the 'file' causing the 
'engine' (XML parser) to not be able to read any of the data.  It sounds 
like you're arguing to keep the XML because you can manually fix the 
problems that can only occur because its in a flat XML file in the first 
place - kind of circular logic.

Bob
_______________________________________________
gnucash-devel mailing list
[EMAIL PROTECTED]
http://www.gnucash.org/cgi-bin/mailman/listinfo/gnucash-devel

Re: XML size (was: no subject)

Reply via email to