Re: Metadata encoding

Michael McCandless Tue, 24 Mar 2009 04:50:40 -0700

Starting with JSON seems like a reasonable choice.

Mike


Marvin Humphrey <[email protected]> wrote:
> Greets,
>
> Lucy indexes will contain significant metadata, which should be written in a
> human-readable format for easy spelunking and debugging.  There are probably
> four main contenders for choice of encoding: JSON, YAML, XML, and a custom
> format.
>
> If we go with a custom format, IMO it should be an extension of JSON.  Our
> needs will not be limited to simple key-value pairs, and designing our own
> full-featured data-description language would be foolish.  Let's try to avoid
> custom formats until we decide that there's no other choice.
>
> XML and YAML are certainly sophisticated enough to handle our data needs.
> However, they both require large, heavyweight parsers, and I think we should
> try to avoid imposing such a dependency on future Lucy C users.
>
> Furthermore, XML is less well-matched to the scalar-list-mapping data
> structures common to the dynamic languages that Lucy targets than either YAML
> or JSON.
>
> YAML offers the advantage of extensible data types.  That's become more
> appealing as I've tried to figure out how to serialize entire schemas in JSON,
> including Analyzer and Similarity specifications.  However, the YAML spec is
> very large.  If we decide that we need YAML's features, I think we ought to
> try to limit ourselves to a subset of the spec.
>
> Still, it would be for the best if we could avoid that kind of complexity, and
> go with the simplest human-readable option that supports scalar-list-mapping
> data structures: JSON.
>
> This excerpt from the YAML 1.2 draft spec points a way forward:
>
>    http://yaml.org/spec/1.2
>
>    1.4. Relation to JSON
>
>    Both JSON and YAML aim to be human readable data interchange formats.
>    However, JSON and YAML have different priorities. JSON’s foremost design
>    goal is simplicity and universality. Thus, JSON is trivial to generate and
>    parse, at the cost of reduced human readability. It also uses a lowest
>    common denominator information model, ensuring any JSON data can be easily
>    processed by every modern programming environment.
>
>    In contrast, YAML’s foremost design goals are human readability and
>    support for serializing arbitrary native data structures. Thus, YAML
>    allows for extremely readable files, but is more complex to generate and
>    parse. In addition, YAML ventures beyond the lowest common denominator
>    data types, requiring more complex processing when crossing between
>    different programming environments.
>
>    YAML can therefore be viewed as a natural superset of JSON, offering
>    improved human readability and a more complete information model. This is
>    also the case in practice; every JSON file is also a valid YAML file. This
>    makes it easy to migrate from JSON to YAML if/when the additional features
>    are required.
>
>    It may be useful to define a intermediate format between YAML and JSON.
>    Such a format would be trivial to parse (but not very human readable),
>    like JSON. At the same time, it would allow for serializing arbitrary
>    native data structures, like YAML. Such a format might also serve as
>    YAML’s "canonical format".
>
>    Defining such a "YSON" format (YSON is a Serialized Object Notation) can
>    be done either by enhancing the JSON specification or by restricting the
>    YAML specification. Such a definition is beyond the scope of this
>    specification.
>
> (Note that YAML version 1.2 is not well supported yet; most parsers support
> 1.0 or 1.1.)
>
> I'm sure we can hammer all the data we need into JSON; it's just a matter of
> at what point it becomes so inelegant that wandering outside the JSON spec
> into YAML becomes the best solution.  That's not a threshold we should cross
> lightly, so for now I advocate that we try to work within JSON's constraints.
>
> Marvin Humphrey
>
>

Re: Metadata encoding

Reply via email to