zeroshade commented on pull request #9862: URL: https://github.com/apache/arrow/pull/9862#issuecomment-813065364
> Spec reference is here https://github.com/apache/arrow/blob/master/format/Message.fbs#L59 Looking at the C++ code it doesn't look like it is conformance with the specification here. I opened https://issues.apache.org/jira/browse/ARROW-12196 Okay, looks pretty simple, just check for an uncompressed length of -1 to indicate the data is not compressed. I'll update and add that. > In C++ at least threading is configurable. I don't have a strong preference here, and really profiling for specific use-cases is probably important. For small batches with narrow schemas I can imagine multithreading being slower then single threaded. So feel free to leave as is if someone runs into performance issues using these features we ca explore the options. So I know that the LZ4 library I'm using says it uses `runtime.GOMAXPROCS` to parallelize compression of a chunk of data, but it's fairly trivial to add the ability to parallelize the compression of the body buffers with an argument to control the parallelization. Thankfully goroutines are much lighter weight than full threads so it's less likely for there to be a performance degradation in the case of a small batch with narrow schema. I'll make the changes and if it turns out to be more complex than I'm thinking, then I'll agree with you that I'll leave it out until someone runs into a performance issue with the feature. If it's as simple as i'm thinking it'll be, then i'll just include it here. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org