zeroshade commented on pull request #9862:
URL: https://github.com/apache/arrow/pull/9862#issuecomment-813065364


   > Spec reference is here 
https://github.com/apache/arrow/blob/master/format/Message.fbs#L59 Looking at 
the C++ code it doesn't look like it is conformance with the specification 
here. I opened https://issues.apache.org/jira/browse/ARROW-12196
   
   Okay, looks pretty simple, just check for an uncompressed length of -1 to 
indicate the data is not compressed. I'll update and add that.
   
   > In C++ at least threading is configurable. I don't have a strong 
preference here, and really profiling for specific use-cases is probably 
important. For small batches with narrow schemas I can imagine multithreading 
being slower then single threaded. So feel free to leave as is if someone runs 
into performance issues using these features we ca explore the options.
   
   So I know that the LZ4 library I'm using says it uses `runtime.GOMAXPROCS` 
to parallelize compression of a chunk of data, but it's fairly trivial to add 
the ability to parallelize the compression of the body buffers with an argument 
to control the parallelization. Thankfully goroutines are much lighter weight 
than full threads so it's less likely for there to be a performance degradation 
in the case of a small batch with narrow schema. I'll make the changes and if 
it turns out to be more complex than I'm thinking, then I'll agree with you 
that I'll leave it out until someone runs into a performance issue with the 
feature. If it's as simple as i'm thinking it'll be, then i'll just include it 
here.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to