Re: Metadata handling (was "Release planning")

Vassil Dichev Mon, 19 Jul 2010 01:23:22 -0700

>> I believe that this is completely wrong. The API should simply take
>> the text assigned to the metadata parameter, store it as part of the
>> message, and return exactly the same string in the message response.
>> It should not care what is in the text and it should not modify it in
>> any way.
>
> Exactly: The basic idea is that external applications can add
> additional information to a particular message in whatever form (XML,
> JSON, text, etc.) with whatever structure desired. This metadata is
> stored without being changed in the data store.  When accessing the
> messages via the various APIs, the metadata is returned in exactly
> format in which it was stored. Period.

If this is completely wrong, then Twitter also got it completely
wrong. AFAICT the annotations mechanism provides a way to include
*structured* data in a tweet, in the form of key/value pairs. You can
find this quote on http://apiwiki.twitter.com/Annotations-Overview:

An annotation is a tuple whose first element is a 'type' and whose
second element is one or more attribute names with values.

So what we're missing is that this is not just a text string. Having
the data untouched breaks coupling with a specific format in the case
when we want to submit a message in one format, but parse it in
another, e.g. send in JSON and then read in as XML (take a look at the
Twitter Annotation examples, this is the kind of scenario described).

If we do not want Twitter's approach, we should have it somewhere
spelled out- whether in the Jira item, in the wiki or somewhere else.
Otherwise it's easy to misunderstand the requirement (as I did).

I can think of 3 ways to avoid metadata in quoted form:

1. Include metadata as a separate attribute from the message text
2. Have metadata be included as XML in the message, unquoted.
3. Have metadata included in the message quoted, and then unquote it
every time we return it back.

I think if we want to preserve the structure (as Twitter did), 2 is
more straightforward. 1 and 3 don't give us a significant advantage
IMO since we'll have to process the XML anyway and convert to JSON
before returning it. What we have to make sure when including XML in
metadata is that the XPath doesn't contain ambiguous references which
might resolve to something within the metadata (this might already be
the case).

It's worth noting that we cannot avoid shell escaping when using
command-line clients and we cannot avoid form-encoding when sending
the message from any client:

http://groups.google.com/group/twitter-development-talk/browse_thread/thread/31f19d9432cc080e?pli=1

So what do we want- raw string or structured data? Do we want to be
more like how Twitter did it or not? Do we want to couple the data
with a certain format (JSON/XML) or not?

Vassil

Re: Metadata handling (was "Release planning")

Reply via email to