chrish42 opened a new pull request #7110:
URL: https://github.com/apache/arrow/pull/7110
Draft pull request for discussion of possible text schema for Arrow, using
Flatbuffers' existing serialization to JSON feature (via
`flatbuffers::GenerateText()`, etc.) on the Arrow schema. Not intented to
necessarily hint at what the final code would look like, but to allow people to
discuss the general approach and see examples of the format.
This approach has the advantages of not introducing any new dependencies,
and reusing existing Flatbuffers for much of the feature.
Once there is agreement on this approach, I can then polish this into a
mergeable state. In the meantime, the (temporary) `arrow-schema` program takes
a hard-coded schema, and outputs the (proposed) corresponding JSON, so people
can see what the text schemas would look like with this proposal.
As an example, the following Arrow schema definition:
```
std::vector<std::shared_ptr<arrow::Field>> schema_vector = {
arrow::field("id", arrow::int64()),
arrow::field("cost", arrow::float64()),
arrow::field("cost_components", arrow::list(arrow::float64()))};
auto schema = arrow::Schema(schema_vector);
```
translates to:
```
{
fields: [
{
name: "id",
nullable: true,
type_type: "Int",
type: {
bitWidth: 64,
is_signed: true
},
children: [
]
},
{
name: "cost",
nullable: true,
type_type: "FloatingPoint",
type: {
precision: "DOUBLE"
},
children: [
]
},
{
name: "cost_components",
nullable: true,
type_type: "List",
type: {
},
children: [
{
name: "item",
nullable: true,
type_type: "FloatingPoint",
type: {
precision: "DOUBLE"
},
children: [
]
}
]
}
]
}
```
Seems pretty straightforward to read for anyone who knows about Arrow's data
representation to me. A bit verbose for the simpler cases, however. Maybe we
would need to add a couple extra keys at the top level... Versioning for the
text schema? Something that says this is an Arrow schema? Or maybe that's not
necessary.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]