chrish42 opened a new pull request #7110:
URL: https://github.com/apache/arrow/pull/7110


   Draft pull request for discussion of possible text schema for Arrow, using 
Flatbuffers' existing serialization to JSON feature (via 
`flatbuffers::GenerateText()`, etc.) on the Arrow schema. Not intented to 
necessarily hint at what the final code would look like, but to allow people to 
discuss the general approach and see examples of the format.
   
   This approach has the advantages of not introducing any new dependencies, 
and reusing existing Flatbuffers for much of the feature.
   
   Once there is agreement on this approach, I can then polish this into a 
mergeable state. In the meantime, the (temporary) `arrow-schema` program takes 
a hard-coded schema, and outputs the (proposed) corresponding JSON, so people 
can see what the text schemas would look like with this proposal.
   
   As an example, the following Arrow schema definition:
   ```
     std::vector<std::shared_ptr<arrow::Field>> schema_vector = {
       arrow::field("id", arrow::int64()),
       arrow::field("cost", arrow::float64()),
       arrow::field("cost_components", arrow::list(arrow::float64()))};
     auto schema = arrow::Schema(schema_vector);
   ```
   translates to:
   ```
   {
     fields: [
       {
         name: "id",
         nullable: true,
         type_type: "Int",
         type: {
           bitWidth: 64,
           is_signed: true
         },
         children: [
   
         ]
       },
       {
         name: "cost",
         nullable: true,
         type_type: "FloatingPoint",
         type: {
           precision: "DOUBLE"
         },
         children: [
   
         ]
       },
       {
         name: "cost_components",
         nullable: true,
         type_type: "List",
         type: {
         },
         children: [
           {
             name: "item",
             nullable: true,
             type_type: "FloatingPoint",
             type: {
               precision: "DOUBLE"
             },
             children: [
   
             ]
           }
         ]
       }
     ]
   }
   ```
   
   Seems pretty straightforward to read for anyone who knows about Arrow's data 
representation to me. A bit verbose for the simpler cases, however. Maybe we 
would need to add a couple extra keys at the top level... Versioning for the 
text schema? Something that says this is an Arrow schema? Or maybe that's not 
necessary.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to