Protocol buffers, thrift?

On 11/3/08 4:07 AM, "Steve Loughran" <[EMAIL PROTECTED]> wrote:

Zhou, Yunqing wrote:
> embedded database cannot handle large-scale data, not very efficient
> I have about 1 billion records.
> these records should be passed through some modules.
> I mean a data exchange format similar to XML but more flexible and
> efficient.


JSON
CSV
erlang-style records (name,value,value,value)
RDF-triples in non-XML representations

For all of these, you need to test with data that includes things like
high unicode characters, single and double quotes, to see how well they
get handled.

you can actually append with XML by not having opening/closing tags,
just stream out the entries to the tail of the file
<entry>...</entry>

To read this in an XML parser, include it inside another XML file:

<?xml version="1.0"?>
<!DOCTYPE log [
      <!ENTITY log SYSTEM "log.xml">
]>

<file>
&log;
</file>

I've done this for very big files, as long as you aren't trying to load
it in-memory to a DOM, things should work

--
Steve Loughran                  http://www.1060.org/blogxter/publish/5
Author: Ant in Action           http://antbook.org/


Reply via email to