I talked to the IBM guys about this problem with JSON-like formats.

Their answer was that if you care enough, then any compression algorithm
around will compress away the type information.

So if you have a splittable compressed format (bz2 works with hadoop), you
are set except for the compression cost.  Decompression cost is usually
compensated for by the I/O advantage.

On Wed, Sep 3, 2008 at 3:52 PM, Jay Kreps <[EMAIL PROTECTED]> wrote:

> ...
>
> Thanks for the pointer to jaql, that seems very cool, but I believe
> jaql would have the same problem if they tried to implement any kind
> of compact structured storage.  Jaql would return a JArray or JRecord
> which might have a variety of fields and you would want to store the
> data about what kinds of fields separately.
>

Reply via email to