[ 
https://issues.apache.org/jira/browse/ARROW-17998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17616287#comment-17616287
 ] 

Pavel Kovalenko commented on ARROW-17998:
-----------------------------------------

[~apitrou] 

Yes. But in this case, you need to write code every time for any schema 
migration case. It's ok if you have only server-to-server communication.

Let's imagine you have an initial schema stored somewhere and want to change it 
(e.g. change the type of column, nullable, or add more metadata). If you have a 
user (who may not be a programmer) who wants to do it you need to provide him 
some API, or he needs to write some script, or he can just change the JSON file 
which is the simplest way.

 

> [Java] JSON representation of pojo.Schema is incompatible with flatbuffers 
> JSON generated via C++ API
> -----------------------------------------------------------------------------------------------------
>
>                 Key: ARROW-17998
>                 URL: https://issues.apache.org/jira/browse/ARROW-17998
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Format, Java
>    Affects Versions: 6.0.1
>            Reporter: Pavel Kovalenko
>            Priority: Major
>              Labels: json, json-schema
>
> I have JSON arrow::Schema representation generated from flatbuffers format in 
> C++:
>  
> {code:java}
> const void* schemaBytes;
> std::string fbsSchemaFile;    
> flatbuffers::LoadFile("/path/to/Schema.fbs", false, &fbsSchemaFile);
> flatbuffers::Parser parser;
> parser.Parse(fbsSchemaFile.c_str());
> std::string json;
> flatbuffers::GenerateTextFromTable(parser, schemaBytes, 
> "org.apache.arrow.flatbuf.Schema", &json);
> return json;{code}
>  
> When I'm trying to read this JSON in Java and create pojo.Schema:
>  
> {code:java}
> String json; // Read from file.
> Schema.fromJSON(json);{code}
>  
>  
> It fails because JSON formats in flatbuffers generation and in Java using 
> Jackson bindings are a bit different:
>  
> C++ Schema Flatbuffers JSON example:
> {code:java}
> {
>   fields: [
>     {
>       name: "cc_call_center_sk",
>       type_type: "Int",
>       type: {
>         bitWidth: 32,
>         is_signed: true
>       },
>       children: [
>       ],
>       custom_metadata: [
>         {
>           key: "metadata",
>           value: "some_metadata"
>         }
>       ]
>     },
>   ],
>   custom_metadata: [
>     {
>       key: "metadata",
>       value: "some_metadata"
>     }
>   ]
> }{code}
> Java Schema JSON example:
> {code:java}
> {
>   "fields" : [ {
>     "name" : "cc_call_center_sk",
>     "nullable" : true,
>     "type" : {
>       "name" : "int",
>       "bitWidth" : 32,
>       "isSigned" : true
>     },
>     "children" : [ ],
>     "metadata" : [ {
>       "value" : "some_metadata",
>       "key" : "metadata"
>     } ]
>   } ],
>   "metadata" : [ {
>     "value" : "some_metadata",
>     "key" : "metadata"
>   } ]
> } {code}
> There is a difference in type id declaration:
> `{*}type_type{*}` field is used in C++ flatbuffers
> `{*}name{*}` field inside `{*}type{*}` field is used in Java
>  
> Also, there is a difference in `{*}metadata{*}` field:
> `{*}custom_metadata{*}` name is used in C++ flatbuffers
> `{*}metadata{*}` name is used in Java
>  
> It makes it impossible to re-use JSON representation from Java in C++ and 
> vice-versa
> Probably the same issue exists in other languages



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to