acezen commented on PR #475:
URL: 
https://github.com/apache/incubator-graphar/pull/475#issuecomment-2100819417

   > Based on my experience with Apache Spark itself, protobuf-way may be a 
painful story:
   > 
   > * you need to incorporate it into CI;
   > * the generated code is huge (for java it may be thousands of LOC);
   > * the generated code is unreadable;
   > * debugging is very hard;
   > * we need to decide, are we going to store the generated code in git or 
not (there are pros and cons);
   > 
   > Btw, my first question is, are we able to use `buf`? 
(https://buf.build/docs/introduction)? Apache Spark itself uses buf because it 
significantly simplifies the process. I guess there shouldn't be a conflict 
with ASF rules. If we can, I would like to recommend to create `buf.work.yaml` 
and `buf.gen.yaml`. You may check examples in Apache Spark 
(https://github.com/apache/spark/tree/master/connector/connect/common/src/main),
 or I can do it because I had some experience with it;
   > 
   > My second question, are we going to store generated code in git or not? 
Apache Spark uses a mixed approach, when they generate JVM-classes on the fly 
(via protobuf maven plugin) but store generated py files in git. For python 
they have a cool solution: 
https://github.com/apache/spark/blob/master/dev/connect-gen-protos.sh, an auto 
formatter of generated code, that allows to have a readable diff in git PRs;
   > 
   > Based on my experience, Data Science / Network Analysis people and Data 
Engineers, who are our target auditory, are not familiar with protobuf, so we 
should also extends the documentation itself.
   > 
   > P.S. It is nice that I was stuck in my work and did not write a lot of 
code for a new Python API. Because otherwise I would need to rewrite it again 
with proto :)
   
   Thanks Sem, the comment and advice is very helpful and insightful, `buf` is 
one solution and we do need to discuss about it!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to