Hi, everyone, I would like to propose that we should considering using an Interface Definition Language(IDL) like Protobuf[1] for GraphAr format definition. Currently we use YAML to describe schema and metadata of graph, and data storage with common format like CSV/Parquet. YAML provide human-readable ability but it can not provide much validation, version-controlled. And various programming languages need to parse them and check the validation by themself.
Using IDL to describe format would bring benefits like: • provide a clear, standardized, language-agnostic format definition that can be version-controlled, shared by libraries and make the format consistent between implementations. • The validation by protobuf can be directly use by our validation of the schema, no need to let the libraries to implement the validation. • Cross-language support, libraries can use the generated structure as graph info directly. This proposal is not replace the YAML with Protobuf. We still use YAML as the final schema&metadata file for user readable, but with IDL to maintaining a robust and precis schema definition. It's kind a hybrid strategy to accommondates both human and machine needs. But Using IDL do bring some disadvantages, Sem has list some in the comment of pr[2]: • the generated code is huge and unreadable. • the generated code may need to store in git. • debugging is very hard. Since this would be a huge change, and I want to hear the thoughts about the proposal from you. [1] https://protobuf.dev/ [2] https://github.com/apache/incubator-graphar/pull/475 Best weibin.zen