[GitHub] [arrow] kou commented on issue #23877: [GLib] Parquet GLib and Red Parquet (Ruby) do not allow specifying compression type

via GitHub Wed, 31 May 2023 13:35:56 -0700


kou commented on issue #23877:
URL: https://github.com/apache/arrow/issues/23877#issuecomment-1570912603


   We can build it by the following change:
   
   ```diff
   diff --git a/Formula/apache-arrow.rb b/Formula/apache-arrow.rb
   index 72cf9c72820..cc882b78a83 100644
   --- a/Formula/apache-arrow.rb
   +++ b/Formula/apache-arrow.rb
   @@ -66,6 +66,7 @@ class ApacheArrow < Formula
          -DARROW_WITH_BROTLI=ON
          -DARROW_WITH_UTF8PROC=ON
          -DARROW_INSTALL_NAME_RPATH=OFF
   +      -DPARQUET_BUILD_EXECUTABLES=ON
        ]
    
        args << "-DARROW_MIMALLOC=ON" unless Hardware::CPU.arm?
   ```
   
   Could you send a pull request to Homebrew?
   
   BTW, we can check original size and compressed size by the following Ruby 
script:
   
   ```ruby
   #!/usr/bin/env ruby
   
   require "parquet"
   
   ARGV.each do |path|
     puts path
     Arrow::MemoryMappedInputStream.open(path) do |input|
       reader = Parquet::ArrowFileReader.new(input)
       metadata = reader.metadata
       metadata.n_row_groups.times.each do |i|
         row_group = metadata.get_row_group(i)
         row_group.n_columns.times.each do |j|
           column_chunk = row_group.get_column_chunk(j)
           p [i, j, column_chunk.total_size, column_chunk.total_compressed_size]
         end
       end
     end
   end
   ```
   
   They are inconvenient API (we want to use `metadata.each_row_group` instead 
of `metadata.n_row_groups.times...`) as you see. If you're interested in 
improving API, please open a new issue for it.
   
   We can check which compression is used with C++ API but it's not exported to 
Ruby yet. (We can't use `column_chunk.compression` for now.)
   If you open a new issue, we can work on it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] kou commented on issue #23877: [GLib] Parquet GLib and Red Parquet (Ruby) do not allow specifying compression type

Reply via email to