On Wed, Mar 29, 2017 at 1:13 PM, Stack <st...@duboce.net> wrote:
> Is the below evidence enough that pb3 in proto2 syntax mode does not drop
> 'unknown' fields? (Maybe you want evidence that java tooling behaves the
> same?)

I reproduced your example with the Java tooling, including changing
some of the fields in the intermediate representation. As long as the
syntax is "proto2", it seems to have compatible semantics.

> To be clear, when we say proxy above, are we expecting that a pb message
> deserialized by a process down-the-line that happens to have a crimped proto
> definition that is absent a couple of fields somehow can re-serialize and at
> the end of the line, all fields are present? Or are we talking pass-through
> of the message without rewrite?

The former; an intermediate handler decoding, [modifying,] and
encoding the record without losing unknown fields.

This looks fine. -C

> Thanks,
> St.Ack
>
>
> # Using the protoc v3.0.2 tool
> $ protoc --version
> libprotoc 3.0.2
>
> # I have a simple proto definition with two fields in it
> $ more pb.proto
> message Test {
>   optional string one = 1;
>   optional string two = 2;
> }
>
> # This is a text-encoded instance of a 'Test' proto message:
> $ more pb.txt
> one: "one"
> two: "two"
>
> # Now I encode the above as a pb binary
> $ protoc --encode=Test pb.proto < pb.txt > pb.bin
> [libprotobuf WARNING google/protobuf/compiler/parser.cc:546] No syntax
> specified for the proto file: pb.proto. Please use 'syntax = "proto2";' or
> 'syntax = "proto3";' to specify a syntax version. (Defaulted to proto2
> syntax.)
>
> # Here is a dump of the binary
> $ od -xc pb.bin
> 0000000      030a    6e6f    1265    7403    6f77
>           \n 003   o   n   e 022 003   t   w   o
> 0000012
>
> # Here is a proto definition file that has a Test Message minus the 'two'
> field.
> $ more pb_drops_two.proto
> message Test {
>   optional string one = 1;
> }
>
> # Use it to decode the bin file:
> $ protoc --decode=Test pb_drops_two.proto < pb.bin
> [libprotobuf WARNING google/protobuf/compiler/parser.cc:546] No syntax
> specified for the proto file: pb_drops_two.proto. Please use 'syntax =
> "proto2";' or 'syntax = "proto3";' to specify a syntax version. (Defaulted
> to proto2 syntax.)
> one: "one"
> 2: "two"
>
> Note how the second field is preserved (absent a field name). It is not
> dropped.
>
> If I change the syntax of pb_drops_two.proto to be proto3, the field IS
> dropped.
>
> # Here proto file with proto3 syntax specified (had to drop the 'optional'
> qualifier -- not allowed in proto3):
> $ more pb_drops_two.proto
> syntax = "proto3";
> message Test {
>   string one = 1;
> }
>
> $ protoc --decode=Test pb_drops_two.proto < pb.bin  > pb_drops_two.txt
> $ more pb_drops_two.txt
> one: "one"
>
>
> I cannot reencode the text output using pb_drops_two.proto. It complains:
>
> $ protoc --encode=Test pb_drops_two.proto < pb_drops_two.txt >
> pb_drops_two.bin
> [libprotobuf WARNING google/protobuf/compiler/parser.cc:546] No syntax
> specified for the proto file: pb_drops_two.proto. Please use 'syntax =
> "proto2";' or 'syntax = "proto3";' to specify a syntax version. (Defaulted
> to proto2 syntax.)
> input:2:1: Expected identifier, got: 2
>
> Proto 2.5 does same:
>
> $ ~/bin/protobuf-2.5.0/src/protoc --encode=Test pb_drops_two.proto <
> pb_drops_two.txt > pb_drops_two.bin
> input:2:1: Expected identifier.
> Failed to parse input.
>
> St.Ack
>
>
>
>
>
>
> On Wed, Mar 29, 2017 at 10:14 AM, Stack <st...@duboce.net> wrote:
>>
>> On Tue, Mar 28, 2017 at 4:18 PM, Andrew Wang <andrew.w...@cloudera.com>
>> wrote:
>>>
>>> >
>>> > > If unknown fields are dropped, then applications proxying tokens and
>>> > other
>>> > >> data between servers will effectively corrupt those messages, unless
>>> > >> we
>>> > >> make everything opaque bytes, which- absent the convenient,
>>> > >> prenominate
>>> > >> semantics managing the conversion- obviate the compatibility
>>> > >> machinery
>>> > that
>>> > >> is the whole point of PB. Google is removing the features that
>>> > >> justified
>>> > >> choosing PB over its alternatives. Since we can't require that our
>>> > >> applications compile (or link) against our updated schema, this
>>> > >> creates
>>> > a
>>> > >> problem that PB was supposed to solve.
>>> > >
>>> > >
>>> > > This is scary, and it potentially affects services outside of the
>>> > > Hadoop
>>> > > codebase. This makes it difficult to assess the impact.
>>> >
>>> > Stack mentioned a compatibility mode that uses the proto2 semantics.
>>> > If that carries unknown fields through intermediate handlers, then
>>> > this objection goes away. -C
>>>
>>>
>>> Did some more googling, found this:
>>>
>>> https://groups.google.com/d/msg/protobuf/Z6pNo81FiEQ/fHkdcNtdAwAJ
>>>
>>> Feng Xiao appears to be a Google engineer, and suggests workarounds like
>>> packing the fields into a byte type. No mention of a PB2 compatibility
>>> mode. Also here:
>>>
>>> https://groups.google.com/d/msg/protobuf/bO2L6-_t91Q/-zIaJAR9AAAJ
>>>
>>> Participants say that unknown fields were dropped for automatic JSON
>>> encoding, since you can't losslessly convert to JSON without knowing the
>>> type.
>>>
>>> Unfortunately, it sounds like these are intrinsic differences with PB3.
>>>
>>
>> As I read it Andrew, the field-dropping happens when pb3 is running in
>> proto3 'mode'. Let me try it...
>>
>> St.Ack
>>
>>
>>>
>>> Best,
>>> Andrew
>>
>>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org

Reply via email to