Hello!

At the moment in format spec the DataType is enumeration:```
enum DataType {
    BOOL = 0;
    INT32 = 1;
    INT64 = 2;
    FLOAT = 3;
    DOUBLE = 4;
    STRING = 5;
    LIST = 6;
    DATE = 7;
    TIMESTAMP = 8;
    TIME = 9;
};
```

But it makes unclear what can be the subtype of the LIST. In the real
life, LIST is transformed to `list<>` in the output yaml:

```
  - properties:
      - name: feature
        data_type: list<float>
        is_primary: false
```

but it does not match with a format specification from my point of
view.

I would like to propose an update to the format definition by making
each possible DataType a message instead of enum. Something like:

```
message BOOL {
  string name = 1;
};
message INT32 {
  string name = 1;
};
message INT64 {
  string name = 1;
};
...
message LIST {
  string name = 1;
  oneof element_type {
    BOOL = 1;
    INT32 = 2;
    INT64 = 3;
    ...;
  }
}
```

For the case we are not going to support nested collections.

For the real code it will look like:

```
  - properties:
      - name: feature
        data_type:
          name: list
          element_type:
            name: float
        is_primary: false
```

Motivation of the proposed change: the current way left handling of
nested types to the specific implementation (like C++ impl writes it in
the way `list<float>`. We should enforce the way in the standard spec
instead!


If there won't be any negative feedback I will open a formal VOTE
process.

Best regards,
Sem

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@graphar.apache.org
For additional commands, e-mail: dev-h...@graphar.apache.org

Reply via email to