tustvold commented on issue #6411: URL: https://github.com/apache/arrow-rs/issues/6411#issuecomment-2357849743
That would only work for a list of primitives, a list of structs would need to encode the structs as list records to preserve the multiple levels of nullability, at which point you're back to effectively the current format, just exploded by one level I think given: * There would be little ability to share code between the two formats * It would not be compatible with other arrow implementations * It is unlikely to perform significantly differently, the major overhead of JSON is tokenising and integer/float parsing which would be unchanged * There are open questions about supporting nested types * It can't be decoded a batch at a time It is hard for me to recommend including it in this repository. Perhaps we could take a step back and ascertain what the desired outcome is? If it is just to reduce the size, running the current JSON format through lz4 will likely yield far greater returns for very little additional overhead compared to the costs of JSON parsing -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
