On Thu, Sep 14, 2006 at 08:22:33PM +0800, Syan Tan wrote:

> how to store a doc_obj that is just a text file ?
My knee-jerk reaction would be, why, of course, dump it
into doc_obj.data.

While this would certainly work and not lose any data I'll
wager a paraphrasing of the question:

How to store a text blob as a document and not lose
*information* ?

Bytea will not lose data but it will lose information unless
the data is self-descriptive to some degree. PDF is
self-descriptive, "text" is not. The latter needs to be
accompanied by at least one bit of metadata to make it
safely transferrable by purely technical means: the
encoding.

So, there's a bunch of solutions:

- Convert the text into UTFx, create a unicode file with the
  proper start of file marker and store that into
  doc_obj.data. Probably the cleanest and recommendable
  solution.

- Store the text in doc_obj.data and keep the encoding
  information elsewhere such as: doc_desc, comments, etc.

- Store the text in doc_desc where it is properly encoded
  and keep a special value in doc_obj.data pointing to doc_desc.

- Store an enriched version (custom format) of the text in
  doc_obj.data which contains the encoding in a
  computationally extractable way (such as XML).

I'd suggest either the first or the last approach. The first
is preferrable, I suppose.

Karsten
-- 
GPG key ID E4071346 @ wwwkeys.pgp.net
E167 67FD A291 2BEA 73BD  4537 78B9 A9F9 E407 1346


_______________________________________________
Gnumed-devel mailing list
[email protected]
http://lists.gnu.org/mailman/listinfo/gnumed-devel

Reply via email to