zeroshade commented on issue #252: URL: https://github.com/apache/arrow-go/issues/252#issuecomment-2596257760
It honestly depends. Both are released under the Apache License 2.0, so licensing isn't going to be an issue or differ between them. Ultimately it's going to be a question of Features, Performance and Maintenance. ### Features For example, if you need to support the parquet encryption capabilities, then you should use this library apache/arrow-go/v18/parquet as `parquet-go/parquet-go` doesn't support the encryption functionality to my knowledge. If you need to leverage bloom filters, then you'll need to use `parquet-go/parquet-go` instead of this library as we don't have support for bloom filters yet (though I am currently working on that!). If you are already leveraging Apache Arrow itself for anything (ADBC for database interaction, Flight/FlightSQL for wire protocol, interacting with DuckDB or other Arrow-compatible/native compute engines, etc), then this library is going to be more performant and beneficial due to the direct integration it has with Arrow through the `pqarrow` package. If your data is laid out in Go structs, then `parquet-go/parquet-go` has simpler API as I haven't had the time to enable writers to accept Go structs for writing yet. I have plans on improving the public APIs for the writers of this library to better utilize generics, while `parquet-go` already has such APIs that utilize generics. And so on. ### Performance As far as Performance and memory usage, I haven't benchmarked anything significant between the two libraries so I can't speak to any comparison there. On this, I invite you to perform comparisons with your use case. That said, if you do find that `parquet-go` is more performant or has better memory usage than this library, please come back and let me know! I'd love to attempt to address any performance/memory usage issues that you come across as great pains have been taken to optimize this library as much as possible (just as `parquet-go` has, with different solutions to some things) ### Maintenance Both projects are actively maintained looking at the frequency of commits. While I can't speak to the maintainers of `parquet-go` I can say that this project is highly connected to the Parquet PMC and Apache community as far as keeping up with any changes to the Parquet format, having input into the wider community, and so on. That said, technically this library is considered the *official* Go library for Parquet. I hope the above helps you make a decision, or at least gives you a direction for exploration. In the end, I'm interested in what you end up going with and why. Particularly, if you do end up going with `parquet-go` over this library, I would like to know why you make that decision so I can address those issues and make this library better and more desireable :smile: Honestly, thanks for filing this issue! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
