Re: [I] Columnar json writer for arrow-json [arrow-rs]

via GitHub Wed, 18 Sep 2024 01:37:53 -0700


tustvold commented on issue #6411:
URL: https://github.com/apache/arrow-rs/issues/6411#issuecomment-2357849743


   That would only work for a list of primitives, a list of structs would need 
to encode the structs as list records to preserve the multiple levels of 
nullability, at which point you're back to effectively the current format, just 
exploded by one level
   
   I think given:
   
   * There would be little ability to share code between the two formats
   * It would not be compatible with other arrow implementations
   * It is unlikely to perform significantly differently, the major overhead of 
JSON is tokenising and integer/float parsing which would be unchanged
   * There are open questions about supporting nested types
   * It can't be decoded a batch at a time
   
   It is hard for me to recommend including it in this repository. 
   
   Perhaps we could take a step back and ascertain what the desired outcome is? 
If it is just to reduce the size, running the current JSON format through lz4 
will likely yield far greater returns for very little additional overhead 
compared to the costs of JSON parsing


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [I] Columnar json writer for arrow-json [arrow-rs]

Reply via email to