hnail commented on pull request #8372:
URL: https://github.com/apache/pulsar/pull/8372#issuecomment-805632215


   > I have a question about FileDescriptorSet field. In 
https://github.com/apache/pulsar/blob/c01b1eeda3221bdbf863bf0f3f8373e93d90adef/pulsar-client/src/test/java/org/apache/pulsar/client/impl/schema/ProtobufNativeSchemaTest.java
 file there is an example test class. It is generated from Test.proto and 
ExternalMessage.proto files. The problem is that, no matter what I try with 
protoc , I can not get same FileDescriptorSet content. It's always slightly 
different. I tried csharp, java, cpp output, with or without 
--descriptor_set_out - I always get different byte array (and base64 string). 
And I can not create producer/consumer on apache pulsar client with 
FileDescriptorSet generated by me, I get 
com.google.protobuf.InvalidProtocolBufferException.invalidEndTag exception 
which as far as I googled tells that data is corrupted. When I try to create 
producer with FileDescriptorSet data from test, it works. What am I doing wrong 
while generating descriptors using protoc ?
   
   hello , happy for your take notice of this pull request .
   I test as your description with the two following test case :
   
   - FileDescriptorSet build by JAVA Proto Class  , same as 
[ProtobufNativeSchemaUtils#serialize](https://github.com/hnail/pulsar/blob/2597f2c6287783ba285737a28cb39cf3e058aa37/pulsar-client/src/main/java/org/apache/pulsar/client/impl/schema/ProtobufNativeSchemaUtils.java#L53)
   
   ```
   // buid FileDescriptorSet byte[] by Class  
   byte[] fileDescriptorSetBytesByClass = 
DescriptorProtos.FileDescriptorSet.newBuilder()
                   
.addFile(Service.ServiceRequest.getDescriptor().getFile().toProto())
                   .build().toByteArray();  
   DescriptorProtos.FileDescriptorSet fileDescriptorSetByClass = 
DescriptorProtos.FileDescriptorSet
                   .parseFrom(fileDescriptorSetBytesByClass);
   // print FileDescriptorSet string which buid by Class
   System.out.println(new 
String(fileDescriptorSetByClass.toBuilder().build().toByteArray(),"utf-8"));
   ```
   - FileDescriptorSet build by protoc command , same as your description:
   ```
   // buid FileDescriptorSet byte[] by 'protoc --include_imports 
--descriptor_set_out Request.desc Request.proto
   byte[] fileDescriptorSetBytesByProtoC = FileUtils.readFileToByteArray(new 
File("/Users/wangguowei/source" +
                   
"/pulsar/dev/pulsar_test/src/main/resources/pulsar/Request.desc"));
   
   DescriptorProtos.FileDescriptorSet fileDescriptorSetByProtoC = 
DescriptorProtos.FileDescriptorSet
                   .parseFrom(fileDescriptorSetBytesByProtoC);
   
   // print FileDescriptorSet string which buid by ProtoC
   System.out.println(new 
String(fileDescriptorSetByClass.toBuilder().build().toByteArray(),"utf-8"));
   ```
   The two realize above works  for me : 
   - The serialized FileDescriptorSet byte[] is slightly different as your 
description, I think maybe java `proto class` lack of information compare with 
`proto file ` when compile java class .
   - Deserialize  FileDescriptorSet is by 
`DescriptorProtos.FileDescriptorSet.parseFrom(bytes[])` in  two way above 
[ProtobufNativeSchemaUtils#deserialize](https://github.com/hnail/pulsar/blob/2597f2c6287783ba285737a28cb39cf3e058aa37/pulsar-client/src/main/java/org/apache/pulsar/client/impl/schema/ProtobufNativeSchemaUtils.java#L90)
 . so , even though byte[] is slightly different , but at the same of working .
   - The current realize is reference google-doc :  [protobuf v3 
Self-describing 
Messages](https://developers.google.com/protocol-buffers/docs/techniques#self-description)
   
   --- 
   so , I think the reason is `new 
ObjectMapper().writeValueAsBytes(schemaData);` which serialize byte[] to String 
, may be is 'UTF-8' or 'Base64' ? 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to