zeroshade commented on issue #35688: URL: https://github.com/apache/arrow/issues/35688#issuecomment-1559591346
@hkpeaks I'm not sure what you mean by failing to build a class library with go. You can easily build a shared library with extern "C" functions by using cgo and the `buildmode` options (see [docs](https://pkg.go.dev/cmd/go#hdr-Build_modes)). > Based on numerous experiments, Golang readat has achieved the results of mmap. As a result, I am much more concerned with performance than with the name "mmap." That's fine, performance is what's important. Just please don't call it "mmap" if it's not actually using "mmap" as that confuses what you're trying to do. There's plenty of situations where mmap isn't necessary or might even slow down performance rather than improve it. > Bytearray reduces memory and CPU usage significantly. It avoids unnecessary dataset serialization and de-serialization. This is interesting to me and I'd like to see how that is the case. Wouldn't you need to serialize/de-serialize from bytes into something you can actually process like the various integral types, float data, etc? > I'll think more about the benefits of using Arrow for CSV; I believe the main benefit is data exchange. It's not so much the benefits of using Arrow for CSV, but rather getting CSV data into Arrow format so that other processes/exchange/analytics can be run on it. The current CSV parsing/reading in the Go Arrow lib is very naive and doesn't do any parallelization, so is ripe to be improved. I haven't had the time to do so myself but it would be fantastic to see contributions there from the community. > However, gRPC is also an excellent way to support very high data exchange performance over the internet. I wholeheartedly agree there, this is why [Arrow Flight RPC](https://arrow.apache.org/docs/format/Flight.html) uses gRPC. > I will consider whether it is possible to implement the Parquet format, which can outperform my current CSV format. What do you mean by implementing the Parquet format in this case? The Go Parquet library here already has implemented the Parquet format spec. Are you intending to re-implement the parquet spec? Or just use the library provided here to perform the reads (and if you find inefficiencies, then contribute improvements back)? > And I hope Apache Foundation can consider bytearray is one of best data exchange format moving from one software to alternative software. By "bytearray" do you mean just a literal array of bytes? Or is there an actual data format called "bytearray"? I'm not quite sure what you're referring to here. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
