On Wed, Mar 29, 2017 at 4:59 PM, Stack <st...@duboce.net> wrote: >> The former; an intermediate handler decoding, [modifying,] and >> encoding the record without losing unknown fields. >> > > I did not try this. Did you? Otherwise I can.
Yeah, I did. Same format. -C >> This looks fine. -C >> >> > Thanks, >> > St.Ack >> > >> > >> > # Using the protoc v3.0.2 tool >> > $ protoc --version >> > libprotoc 3.0.2 >> > >> > # I have a simple proto definition with two fields in it >> > $ more pb.proto >> > message Test { >> > optional string one = 1; >> > optional string two = 2; >> > } >> > >> > # This is a text-encoded instance of a 'Test' proto message: >> > $ more pb.txt >> > one: "one" >> > two: "two" >> > >> > # Now I encode the above as a pb binary >> > $ protoc --encode=Test pb.proto < pb.txt > pb.bin >> > [libprotobuf WARNING google/protobuf/compiler/parser.cc:546] No syntax >> > specified for the proto file: pb.proto. Please use 'syntax = "proto2";' >> > or >> > 'syntax = "proto3";' to specify a syntax version. (Defaulted to proto2 >> > syntax.) >> > >> > # Here is a dump of the binary >> > $ od -xc pb.bin >> > 0000000 030a 6e6f 1265 7403 6f77 >> > \n 003 o n e 022 003 t w o >> > 0000012 >> > >> > # Here is a proto definition file that has a Test Message minus the >> > 'two' >> > field. >> > $ more pb_drops_two.proto >> > message Test { >> > optional string one = 1; >> > } >> > >> > # Use it to decode the bin file: >> > $ protoc --decode=Test pb_drops_two.proto < pb.bin >> > [libprotobuf WARNING google/protobuf/compiler/parser.cc:546] No syntax >> > specified for the proto file: pb_drops_two.proto. Please use 'syntax = >> > "proto2";' or 'syntax = "proto3";' to specify a syntax version. >> > (Defaulted >> > to proto2 syntax.) >> > one: "one" >> > 2: "two" >> > >> > Note how the second field is preserved (absent a field name). It is not >> > dropped. >> > >> > If I change the syntax of pb_drops_two.proto to be proto3, the field IS >> > dropped. >> > >> > # Here proto file with proto3 syntax specified (had to drop the >> > 'optional' >> > qualifier -- not allowed in proto3): >> > $ more pb_drops_two.proto >> > syntax = "proto3"; >> > message Test { >> > string one = 1; >> > } >> > >> > $ protoc --decode=Test pb_drops_two.proto < pb.bin > pb_drops_two.txt >> > $ more pb_drops_two.txt >> > one: "one" >> > >> > >> > I cannot reencode the text output using pb_drops_two.proto. It >> > complains: >> > >> > $ protoc --encode=Test pb_drops_two.proto < pb_drops_two.txt > >> > pb_drops_two.bin >> > [libprotobuf WARNING google/protobuf/compiler/parser.cc:546] No syntax >> > specified for the proto file: pb_drops_two.proto. Please use 'syntax = >> > "proto2";' or 'syntax = "proto3";' to specify a syntax version. >> > (Defaulted >> > to proto2 syntax.) >> > input:2:1: Expected identifier, got: 2 >> > >> > Proto 2.5 does same: >> > >> > $ ~/bin/protobuf-2.5.0/src/protoc --encode=Test pb_drops_two.proto < >> > pb_drops_two.txt > pb_drops_two.bin >> > input:2:1: Expected identifier. >> > Failed to parse input. >> > >> > St.Ack >> > >> > >> > >> > >> > >> > >> > On Wed, Mar 29, 2017 at 10:14 AM, Stack <st...@duboce.net> wrote: >> >> >> >> On Tue, Mar 28, 2017 at 4:18 PM, Andrew Wang <andrew.w...@cloudera.com> >> >> wrote: >> >>> >> >>> > >> >>> > > If unknown fields are dropped, then applications proxying tokens >> >>> > > and >> >>> > other >> >>> > >> data between servers will effectively corrupt those messages, >> >>> > >> unless >> >>> > >> we >> >>> > >> make everything opaque bytes, which- absent the convenient, >> >>> > >> prenominate >> >>> > >> semantics managing the conversion- obviate the compatibility >> >>> > >> machinery >> >>> > that >> >>> > >> is the whole point of PB. Google is removing the features that >> >>> > >> justified >> >>> > >> choosing PB over its alternatives. Since we can't require that >> >>> > >> our >> >>> > >> applications compile (or link) against our updated schema, this >> >>> > >> creates >> >>> > a >> >>> > >> problem that PB was supposed to solve. >> >>> > > >> >>> > > >> >>> > > This is scary, and it potentially affects services outside of the >> >>> > > Hadoop >> >>> > > codebase. This makes it difficult to assess the impact. >> >>> > >> >>> > Stack mentioned a compatibility mode that uses the proto2 semantics. >> >>> > If that carries unknown fields through intermediate handlers, then >> >>> > this objection goes away. -C >> >>> >> >>> >> >>> Did some more googling, found this: >> >>> >> >>> https://groups.google.com/d/msg/protobuf/Z6pNo81FiEQ/fHkdcNtdAwAJ >> >>> >> >>> Feng Xiao appears to be a Google engineer, and suggests workarounds >> >>> like >> >>> packing the fields into a byte type. No mention of a PB2 compatibility >> >>> mode. Also here: >> >>> >> >>> https://groups.google.com/d/msg/protobuf/bO2L6-_t91Q/-zIaJAR9AAAJ >> >>> >> >>> Participants say that unknown fields were dropped for automatic JSON >> >>> encoding, since you can't losslessly convert to JSON without knowing >> >>> the >> >>> type. >> >>> >> >>> Unfortunately, it sounds like these are intrinsic differences with >> >>> PB3. >> >>> >> >> >> >> As I read it Andrew, the field-dropping happens when pb3 is running in >> >> proto3 'mode'. Let me try it... >> >> >> >> St.Ack >> >> >> >> >> >>> >> >>> Best, >> >>> Andrew >> >> >> >> >> > > > --------------------------------------------------------------------- To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org