On Wed, Mar 29, 2017 at 3:12 PM, Chris Douglas <chris.doug...@gmail.com> wrote:
> On Wed, Mar 29, 2017 at 1:13 PM, Stack <st...@duboce.net> wrote: > > Is the below evidence enough that pb3 in proto2 syntax mode does not drop > > 'unknown' fields? (Maybe you want evidence that java tooling behaves the > > same?) > > I reproduced your example with the Java tooling, including changing > some of the fields in the intermediate representation. As long as the > syntax is "proto2", it seems to have compatible semantics. > > Thanks. > > To be clear, when we say proxy above, are we expecting that a pb message > > deserialized by a process down-the-line that happens to have a crimped > proto > > definition that is absent a couple of fields somehow can re-serialize > and at > > the end of the line, all fields are present? Or are we talking > pass-through > > of the message without rewrite? > > The former; an intermediate handler decoding, [modifying,] and > encoding the record without losing unknown fields. > > I did not try this. Did you? Otherwise I can. St.Ack > This looks fine. -C > > > Thanks, > > St.Ack > > > > > > # Using the protoc v3.0.2 tool > > $ protoc --version > > libprotoc 3.0.2 > > > > # I have a simple proto definition with two fields in it > > $ more pb.proto > > message Test { > > optional string one = 1; > > optional string two = 2; > > } > > > > # This is a text-encoded instance of a 'Test' proto message: > > $ more pb.txt > > one: "one" > > two: "two" > > > > # Now I encode the above as a pb binary > > $ protoc --encode=Test pb.proto < pb.txt > pb.bin > > [libprotobuf WARNING google/protobuf/compiler/parser.cc:546] No syntax > > specified for the proto file: pb.proto. Please use 'syntax = "proto2";' > or > > 'syntax = "proto3";' to specify a syntax version. (Defaulted to proto2 > > syntax.) > > > > # Here is a dump of the binary > > $ od -xc pb.bin > > 0000000 030a 6e6f 1265 7403 6f77 > > \n 003 o n e 022 003 t w o > > 0000012 > > > > # Here is a proto definition file that has a Test Message minus the 'two' > > field. > > $ more pb_drops_two.proto > > message Test { > > optional string one = 1; > > } > > > > # Use it to decode the bin file: > > $ protoc --decode=Test pb_drops_two.proto < pb.bin > > [libprotobuf WARNING google/protobuf/compiler/parser.cc:546] No syntax > > specified for the proto file: pb_drops_two.proto. Please use 'syntax = > > "proto2";' or 'syntax = "proto3";' to specify a syntax version. > (Defaulted > > to proto2 syntax.) > > one: "one" > > 2: "two" > > > > Note how the second field is preserved (absent a field name). It is not > > dropped. > > > > If I change the syntax of pb_drops_two.proto to be proto3, the field IS > > dropped. > > > > # Here proto file with proto3 syntax specified (had to drop the > 'optional' > > qualifier -- not allowed in proto3): > > $ more pb_drops_two.proto > > syntax = "proto3"; > > message Test { > > string one = 1; > > } > > > > $ protoc --decode=Test pb_drops_two.proto < pb.bin > pb_drops_two.txt > > $ more pb_drops_two.txt > > one: "one" > > > > > > I cannot reencode the text output using pb_drops_two.proto. It complains: > > > > $ protoc --encode=Test pb_drops_two.proto < pb_drops_two.txt > > > pb_drops_two.bin > > [libprotobuf WARNING google/protobuf/compiler/parser.cc:546] No syntax > > specified for the proto file: pb_drops_two.proto. Please use 'syntax = > > "proto2";' or 'syntax = "proto3";' to specify a syntax version. > (Defaulted > > to proto2 syntax.) > > input:2:1: Expected identifier, got: 2 > > > > Proto 2.5 does same: > > > > $ ~/bin/protobuf-2.5.0/src/protoc --encode=Test pb_drops_two.proto < > > pb_drops_two.txt > pb_drops_two.bin > > input:2:1: Expected identifier. > > Failed to parse input. > > > > St.Ack > > > > > > > > > > > > > > On Wed, Mar 29, 2017 at 10:14 AM, Stack <st...@duboce.net> wrote: > >> > >> On Tue, Mar 28, 2017 at 4:18 PM, Andrew Wang <andrew.w...@cloudera.com> > >> wrote: > >>> > >>> > > >>> > > If unknown fields are dropped, then applications proxying tokens > and > >>> > other > >>> > >> data between servers will effectively corrupt those messages, > unless > >>> > >> we > >>> > >> make everything opaque bytes, which- absent the convenient, > >>> > >> prenominate > >>> > >> semantics managing the conversion- obviate the compatibility > >>> > >> machinery > >>> > that > >>> > >> is the whole point of PB. Google is removing the features that > >>> > >> justified > >>> > >> choosing PB over its alternatives. Since we can't require that our > >>> > >> applications compile (or link) against our updated schema, this > >>> > >> creates > >>> > a > >>> > >> problem that PB was supposed to solve. > >>> > > > >>> > > > >>> > > This is scary, and it potentially affects services outside of the > >>> > > Hadoop > >>> > > codebase. This makes it difficult to assess the impact. > >>> > > >>> > Stack mentioned a compatibility mode that uses the proto2 semantics. > >>> > If that carries unknown fields through intermediate handlers, then > >>> > this objection goes away. -C > >>> > >>> > >>> Did some more googling, found this: > >>> > >>> https://groups.google.com/d/msg/protobuf/Z6pNo81FiEQ/fHkdcNtdAwAJ > >>> > >>> Feng Xiao appears to be a Google engineer, and suggests workarounds > like > >>> packing the fields into a byte type. No mention of a PB2 compatibility > >>> mode. Also here: > >>> > >>> https://groups.google.com/d/msg/protobuf/bO2L6-_t91Q/-zIaJAR9AAAJ > >>> > >>> Participants say that unknown fields were dropped for automatic JSON > >>> encoding, since you can't losslessly convert to JSON without knowing > the > >>> type. > >>> > >>> Unfortunately, it sounds like these are intrinsic differences with PB3. > >>> > >> > >> As I read it Andrew, the field-dropping happens when pb3 is running in > >> proto3 'mode'. Let me try it... > >> > >> St.Ack > >> > >> > >>> > >>> Best, > >>> Andrew > >> > >> > > >