Hi Chet,
> If it'd be better to just post to the mailing-list, let me know, and I'll do
> so?
That’s what they are for.
> (1) Thrift assumes that containers serialize by writing (a) their SIZE and
> (b) their ELEMENT TYPES.
> Obviously if we want a pretty (human-{writable,reable}) JSON format, that's a
> non-starter.
> Nobody's going to count map-entries when they're updating JSON documents.
> Ditto writing down element types.
Thrift is designed not only to support JS, but to act as a cross-language
framework. A lot of these languages – including JavaScript – do indeed have the
capability to return the count of elements in a container easily. And the types
are known from the IDL, so neither one is a big deal with most (all?) languages
that are currently supported.
On the other hand, there are also methods like ReadListBegin() and
WriteStructEnd(), there is even a Read/WriteFieldBegin()-pair. A lot of these
functions are doing nothing in certain cases, e.g. the binary protocol has no
code in its writeStructBegin() implementation:
https://github.com/apache/thrift/blob/master/lib/java/src/org/apache/thrift/protocol/TBinaryProtocol.java
So if you don’t need to write length and type, you are absolutely free to do so
in your TProtocol implementation. The only caveat is that the generated read
code will expect the count, so readListBegin() etc. somehow must be able to
deliver that information. Same for type. If you can, from the following given
data, derive that this must be an list<int64> and cannot be an list<int8> or a
list<double>, then do it:
[42]
> BUT (2) the part just above only works when containers are -directly- the
> types of fields.
I’d doubt even that. See my example. But anyway ...
> It's possible in Thrift to have "iterated containers", [...] I can't
> determine how "officially supported"
> these usages are, but it seems pretty infeasible to imagine a way to both
Oh, absolutely, they are! In fact, they are even part of the ThriftTest.thrift
file, which is the basis of the cross-platform test suite:
https://github.com/apache/thrift/blob/master/test/ThriftTest.thrift
Have fun,
JensG
From: Chet Murthy
Sent: Monday, October 16, 2017 9:45 PM
To: [email protected] ; [email protected]
Subject: "iterated container types" ?
Randy, Jens,
TL;DR -- long note with details of issues I ran into. Really, I'm looking for
whether this is hopeless and I should stop (which would be sad, b/c things work
for all but some special cases (explained below)).
Hey, I'm hacking away, implementing "nicer JSON serialization" (and the
metadata support for it) and have run into some issues. I'm not sure if the
right way to ask about these is on the mailing-list (I guess, "thrift-dev"?) or
directly emailing you. If it'd be better to just post to the mailing-list, let
me know, and I'll do so?
In any case, the first level of adding nicer JSON serialization was smooth.
I'ts straightforward to make it work for structs. But for list/map/set, it's
messier, and perhaps impossible.
There are two issues, one of which is mitigable, and the other perhaps
insurmountable given the current design of Thrift. I thought I'd list them,
and if you could give your advice/judgment, I'd appreciate it greatly.
(1) Thrift assumes that containers serialize by writing (a) their SIZE and (b)
their ELEMENT TYPES. Obviously if we want a pretty (human-{writable,reable})
JSON format, that's a non-starter. Nobody's going to count map-entries when
they're updating JSON documents. Ditto writing down element types.
--> the (a) size issue is mitigable by using a JSON parser to parse the
entire document, and then the deserializer would walk the "DOM tree". So when
readListBegin() is called, we can compute the length of the list.
--> For (b), we can also keep track of the expected type of a field when we
start deserializing it, so that readListBegin() can return the element-type.
And similarly for map/set.
BUT (2) the part just above only works when containers are -directly- the types
of fields. It's possible in Thrift to have "iterated containers", e.g.
8: required list<list<string>> h,
9: required list<set<i32>> i,
10: required map<string, set<i32>> j,
I can't determine how "officially supported" these usages are, but it seems
pretty infeasible to imagine a way to both
(i) honor the TProtocol contract to its invoking code (typically generated)
(ii) produce a "pretty" JSON serialization format for these types.
Now, if it were possible to forbid "iterated containers", I think everything
could work out. I'd produce two "nicer JSON serializers":
(a) with Thrift, a version that still has to have the "size" of containers, but
no type information and field-names instead of field-IDs
(b) as a contrib, a version that doesn't have to have the size of containers,
but users a full JSON parser to build a DOM before deserializing.
OK, long note. Maybe I should be sending this to thrift-dev ?
In any case, thanks for your advice on all this.
Cheers,
--chet--