Sorry, it looks like the message was sent multiple times. Let's use this thread 
for the discussion!


On June 13, 2025 8:11:21 AM GMT+02:00, Sem <ssinche...@apache.org> wrote:
>Hello!
>
>At the moment in format spec the DataType is enumeration:```
>enum DataType {
>    BOOL = 0;
>    INT32 = 1;
>    INT64 = 2;
>    FLOAT = 3;
>    DOUBLE = 4;
>    STRING = 5;
>    LIST = 6;
>    DATE = 7;
>    TIMESTAMP = 8;
>    TIME = 9;
>};
>```
>
>But it makes unclear what can be the subtype of the LIST. In the real
>life, LIST is transformed to `list<>` in the output yaml:
>
>```
>  - properties:
>      - name: feature
>        data_type: list<float>
>        is_primary: false
>```
>
>but it does not match with a format specification from my point of
>view.
>
>I would like to propose an update to the format definition by making
>each possible DataType a message instead of enum. Something like:
>
>```
>message BOOL {
>  string name = 1;
>};
>message INT32 {
>  string name = 1;
>};
>message INT64 {
>  string name = 1;
>};
>...
>message LIST {
>  string name = 1;
>  oneof element_type {
>    BOOL = 1;
>    INT32 = 2;
>    INT64 = 3;
>    ...;
>  }
>}
>```
>
>For the case we are not going to support nested collections.
>
>For the real code it will look like:
>
>```
>  - properties:
>      - name: feature
>        data_type:
>          name: list
>          element_type:
>            name: float
>        is_primary: false
>```
>
>Motivation of the proposed change: the current way left handling of
>nested types to the specific implementation (like C++ impl writes it in
>the way `list<float>`. We should enforce the way in the standard spec
>instead!
>
>
>If there won't be any negative feedback I will open a formal VOTE
>process.
>
>Best regards,
>Sem
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: dev-unsubscr...@graphar.apache.org
>For additional commands, e-mail: dev-h...@graphar.apache.org
>

Reply via email to