hnail commented on pull request #8372: URL: https://github.com/apache/pulsar/pull/8372#issuecomment-805632215
> I have a question about FileDescriptorSet field. In https://github.com/apache/pulsar/blob/c01b1eeda3221bdbf863bf0f3f8373e93d90adef/pulsar-client/src/test/java/org/apache/pulsar/client/impl/schema/ProtobufNativeSchemaTest.java file there is an example test class. It is generated from Test.proto and ExternalMessage.proto files. The problem is that, no matter what I try with protoc , I can not get same FileDescriptorSet content. It's always slightly different. I tried csharp, java, cpp output, with or without --descriptor_set_out - I always get different byte array (and base64 string). And I can not create producer/consumer on apache pulsar client with FileDescriptorSet generated by me, I get com.google.protobuf.InvalidProtocolBufferException.invalidEndTag exception which as far as I googled tells that data is corrupted. When I try to create producer with FileDescriptorSet data from test, it works. What am I doing wrong while generating descriptors using protoc ? hello , happy for your take notice of this pull request . I test as your description with the two following test case : - FileDescriptorSet build by JAVA Proto Class , same as [ProtobufNativeSchemaUtils#serialize](https://github.com/hnail/pulsar/blob/2597f2c6287783ba285737a28cb39cf3e058aa37/pulsar-client/src/main/java/org/apache/pulsar/client/impl/schema/ProtobufNativeSchemaUtils.java#L53) ``` // buid FileDescriptorSet byte[] by Class byte[] fileDescriptorSetBytesByClass = DescriptorProtos.FileDescriptorSet.newBuilder() .addFile(Service.ServiceRequest.getDescriptor().getFile().toProto()) .build().toByteArray(); DescriptorProtos.FileDescriptorSet fileDescriptorSetByClass = DescriptorProtos.FileDescriptorSet .parseFrom(fileDescriptorSetBytesByClass); // print FileDescriptorSet string which buid by Class System.out.println(new String(fileDescriptorSetByClass.toBuilder().build().toByteArray(),"utf-8")); ``` - FileDescriptorSet build by protoc command , same as your description: ``` // buid FileDescriptorSet byte[] by 'protoc --include_imports --descriptor_set_out Request.desc Request.proto byte[] fileDescriptorSetBytesByProtoC = FileUtils.readFileToByteArray(new File("/Users/wangguowei/source" + "/pulsar/dev/pulsar_test/src/main/resources/pulsar/Request.desc")); DescriptorProtos.FileDescriptorSet fileDescriptorSetByProtoC = DescriptorProtos.FileDescriptorSet .parseFrom(fileDescriptorSetBytesByProtoC); // print FileDescriptorSet string which buid by ProtoC System.out.println(new String(fileDescriptorSetByClass.toBuilder().build().toByteArray(),"utf-8")); ``` The two realize above works for me : - The serialized FileDescriptorSet byte[] is slightly different as your description, I think maybe java `proto class` lack of information compare with `proto file ` when compile java class . - Deserialize FileDescriptorSet is by `DescriptorProtos.FileDescriptorSet.parseFrom(bytes[])` in two way above [ProtobufNativeSchemaUtils#deserialize](https://github.com/hnail/pulsar/blob/2597f2c6287783ba285737a28cb39cf3e058aa37/pulsar-client/src/main/java/org/apache/pulsar/client/impl/schema/ProtobufNativeSchemaUtils.java#L90) . so , even though byte[] is slightly different , but at the same of working . - The current realize is reference google-doc : [protobuf v3 Self-describing Messages](https://developers.google.com/protocol-buffers/docs/techniques#self-description) --- so , I think the reason is `new ObjectMapper().writeValueAsBytes(schemaData);` which serialize byte[] to String , may be is 'UTF-8' or 'Base64' ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
