Hi folks. I want to discuss the current implementation of meta storage(schema).
The roles of the schema in S2Graph are following. 1. When it accepts write request that is represented by logical vertex/edge, it uses schema to build a physical internal representation, which is specific to storage backend. Also, the schema is used to validate the request. 2. When a query comes in, it uses schema to build physical request, which is specific to storage backend, then it uses schema to transform physical representation to logical vertex/edge. Current implementation assumes that the schema is very small, compared to actual vertex/edge data. since it read schema a lot so it is important to build a correct index that supports O(1) for schema is crucial, so it uses a local cache to increase performance. The problem with current implementation is that it is impossible to inject a different implementation of schema since implementations are too tightly coupled. In s2jobs, S2GraphSource/S2GraphSink use S2Graph instance to serialize/deserialize data from HFile, and there is no way to avoid accessing meta database for schema on each spark executor(details on https://issues.apache.org/jira/browse/S2GRAPH-252). In this case, a static schema can be built on spark driver via reading the file or read meta database or whatever, then broadcast static schema on every spark executor. In general, I believe what we need is the way to inject a different implementation of a schema. Currently S2Graph only have the implementation using meta database with local cache, but it would be great if the implementation of the schema is abstracted, and finally, a different implementation can be injected when we create S2Graph instance. To achieve this, I believe abstracting the necessary methods in one interface is a good start, so here I collected most of the methods that are related to the schema. I suggest to add SchemaManager interface, then refactor current code base to use this interface to access schema. I want to discuss if this is the right way and if we need to work on this first since it will affect lots of codes. Please feel free to comment. Here is the draft of the interface. https://docs.google.com/document/d/134zPVm8vtXMRKC77bsVorp_06zZhU9hk2rQIKXH6HpI/edit?usp=sharing