Hi Chet,

> If it'd be better to just post to the mailing-list, let me know, and I'll do 
> so?

That’s what they are for.

> (1) Thrift assumes that containers serialize by writing (a) their SIZE and 
> (b) their ELEMENT TYPES.
> Obviously if we want a pretty (human-{writable,reable}) JSON format, that's a 
> non-starter.
> Nobody's going to count map-entries when they're updating JSON documents.
> Ditto writing down element types.

Thrift is designed not only to support JS, but to act as a cross-language 
framework. A lot of these languages – including JavaScript – do indeed have the 
capability to return the count of elements in a container easily. And the types 
are known from the IDL, so neither one is a big deal with most (all?) languages 
that are currently supported.

On the other hand, there are also methods like ReadListBegin() and 
WriteStructEnd(), there is even a Read/WriteFieldBegin()-pair. A lot of these 
functions are doing nothing in certain cases, e.g. the binary protocol has no 
code in its writeStructBegin() implementation:
https://github.com/apache/thrift/blob/master/lib/java/src/org/apache/thrift/protocol/TBinaryProtocol.java

So if you don’t need to write length and type, you are absolutely free to do so 
in your TProtocol implementation. The only caveat is that the generated read 
code will expect the count, so readListBegin() etc. somehow must be able to 
deliver that information. Same for type. If you can, from the following given 
data, derive that this must be an list<int64> and cannot be an list<int8> or a 
list<double>, then do it:

    [42]

> BUT (2) the part just above only works when containers are -directly- the 
> types of fields.

I’d doubt even that. See my example. But anyway ...

> It's possible in Thrift to have "iterated containers", [...] I can't 
> determine how "officially supported"
> these usages are, but it seems pretty infeasible to imagine a way to both

Oh, absolutely, they are! In fact, they are even part of the ThriftTest.thrift 
file, which is the basis of the cross-platform test suite:
https://github.com/apache/thrift/blob/master/test/ThriftTest.thrift

Have fun,
JensG




From: Chet Murthy
Sent: Monday, October 16, 2017 9:45 PM
To: [email protected] ; [email protected]
Subject: "iterated container types" ?

Randy, Jens,

TL;DR -- long note with details of issues I ran into.  Really, I'm looking for 
whether this is hopeless and I should stop (which would be sad, b/c things work 
for all but some special cases (explained below)).

Hey, I'm hacking away, implementing "nicer JSON serialization" (and the 
metadata support for it) and have run into some issues.  I'm not sure if the 
right way to ask about these is on the mailing-list (I guess, "thrift-dev"?) or 
directly emailing you.  If it'd be better to just post to the mailing-list, let 
me know, and I'll do so?

In any case, the first level of adding nicer JSON serialization was smooth.  
I'ts straightforward to make it work for structs.  But for list/map/set, it's 
messier, and perhaps impossible.

There are two issues, one of which is mitigable, and the other perhaps 
insurmountable given the current design of Thrift.  I thought I'd list them, 
and if you could give your advice/judgment, I'd appreciate it greatly.

(1) Thrift assumes that containers serialize by writing (a) their SIZE and (b) 
their ELEMENT TYPES.  Obviously if we want a pretty (human-{writable,reable}) 
JSON format, that's a non-starter.  Nobody's going to count map-entries when 
they're updating JSON documents.  Ditto writing down element types.

  --> the (a) size issue is mitigable by using a JSON parser to parse the 
entire document, and then the deserializer would walk the "DOM tree".  So when 
readListBegin() is called, we can compute the length of the list.

  --> For (b), we can also keep track of the expected type of a field when we 
start deserializing it, so that readListBegin() can return the element-type.  
And similarly for map/set.

BUT (2) the part just above only works when containers are -directly- the types 
of fields.  It's possible in Thrift to have "iterated containers", e.g.

  8: required list<list<string>> h,
  9: required list<set<i32>> i,
  10: required map<string, set<i32>> j,

I can't determine how "officially supported" these usages are, but it seems 
pretty infeasible to imagine a way to both

(i) honor the TProtocol contract to its invoking code (typically generated)
(ii) produce a "pretty" JSON serialization format for these types.

Now, if it were possible to forbid "iterated containers", I think everything 
could work out.  I'd produce two "nicer JSON serializers":

(a) with Thrift, a version that still has to have the "size" of containers, but 
no type information and field-names instead of field-IDs

(b) as a contrib, a version that doesn't have to have the size of containers, 
but users a full JSON parser to build a DOM before deserializing.

OK, long note.  Maybe I should be sending this to thrift-dev ?

In any case, thanks for your advice on all this.

Cheers,
--chet--


Reply via email to