Hi, everyone,

I would like to propose that we should considering using an Interface 
Definition Language(IDL) like Protobuf[1] for GraphAr format definition.
Currently we use YAML to describe schema and metadata of graph, and data 
storage with common format like CSV/Parquet. YAML
provide human-readable ability but it can not provide much validation, 
version-controlled. And various programming languages need
to parse them and check the validation by themself.

Using IDL to describe format would bring benefits like:

• provide a clear, standardized, language-agnostic format definition that can 
be version-controlled, shared by libraries and make the format consistent 
between implementations.
• The validation by protobuf can be directly use by our validation of the 
schema, no need to let the libraries to implement the validation.
• Cross-language support, libraries can use the generated structure as graph 
info directly.


This proposal is not replace the YAML with Protobuf. We still use YAML as the 
final schema&metadata file for user readable, but with IDL to maintaining  a
robust and precis schema definition. It's kind a hybrid strategy to 
accommondates both human and machine needs.

But Using IDL do bring some disadvantages, Sem has list some in the comment of 
pr[2]:

• the generated code is huge and unreadable.
• the generated code may need to store in git.
• debugging is very hard.


Since this would be a huge change, and I want to hear the thoughts about the 
proposal from you.


[1] https://protobuf.dev/
[2] https://github.com/apache/incubator-graphar/pull/475

Best
weibin.zen

Reply via email to