kou commented on issue #23877:
URL: https://github.com/apache/arrow/issues/23877#issuecomment-1570912603
We can build it by the following change:
```diff
diff --git a/Formula/apache-arrow.rb b/Formula/apache-arrow.rb
index 72cf9c72820..cc882b78a83 100644
--- a/Formula/apache-arrow.rb
+++ b/Formula/apache-arrow.rb
@@ -66,6 +66,7 @@ class ApacheArrow < Formula
-DARROW_WITH_BROTLI=ON
-DARROW_WITH_UTF8PROC=ON
-DARROW_INSTALL_NAME_RPATH=OFF
+ -DPARQUET_BUILD_EXECUTABLES=ON
]
args << "-DARROW_MIMALLOC=ON" unless Hardware::CPU.arm?
```
Could you send a pull request to Homebrew?
BTW, we can check original size and compressed size by the following Ruby
script:
```ruby
#!/usr/bin/env ruby
require "parquet"
ARGV.each do |path|
puts path
Arrow::MemoryMappedInputStream.open(path) do |input|
reader = Parquet::ArrowFileReader.new(input)
metadata = reader.metadata
metadata.n_row_groups.times.each do |i|
row_group = metadata.get_row_group(i)
row_group.n_columns.times.each do |j|
column_chunk = row_group.get_column_chunk(j)
p [i, j, column_chunk.total_size, column_chunk.total_compressed_size]
end
end
end
end
```
They are inconvenient API (we want to use `metadata.each_row_group` instead
of `metadata.n_row_groups.times...`) as you see. If you're interested in
improving API, please open a new issue for it.
We can check which compression is used with C++ API but it's not exported to
Ruby yet. (We can't use `column_chunk.compression` for now.)
If you open a new issue, we can work on it.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]