On Thu, Sep 14, 2006 at 08:22:33PM +0800, Syan Tan wrote: > how to store a doc_obj that is just a text file ? My knee-jerk reaction would be, why, of course, dump it into doc_obj.data.
While this would certainly work and not lose any data I'll wager a paraphrasing of the question: How to store a text blob as a document and not lose *information* ? Bytea will not lose data but it will lose information unless the data is self-descriptive to some degree. PDF is self-descriptive, "text" is not. The latter needs to be accompanied by at least one bit of metadata to make it safely transferrable by purely technical means: the encoding. So, there's a bunch of solutions: - Convert the text into UTFx, create a unicode file with the proper start of file marker and store that into doc_obj.data. Probably the cleanest and recommendable solution. - Store the text in doc_obj.data and keep the encoding information elsewhere such as: doc_desc, comments, etc. - Store the text in doc_desc where it is properly encoded and keep a special value in doc_obj.data pointing to doc_desc. - Store an enriched version (custom format) of the text in doc_obj.data which contains the encoding in a computationally extractable way (such as XML). I'd suggest either the first or the last approach. The first is preferrable, I suppose. Karsten -- GPG key ID E4071346 @ wwwkeys.pgp.net E167 67FD A291 2BEA 73BD 4537 78B9 A9F9 E407 1346 _______________________________________________ Gnumed-devel mailing list [email protected] http://lists.gnu.org/mailman/listinfo/gnumed-devel
