ashishbista opened a new issue, #39433:
URL: https://github.com/apache/arrow/issues/39433

   ### Describe the bug, including details regarding any error messages, 
version, and platform.
   
   I am currently using the red-arrow gem to process JSON files within my Ruby 
application. While the gem works seamlessly with smaller JSON files, I have 
encountered an issue when attempting to process a larger JSON file with a size 
of 2.3MB.
   
   The code snippet below illustrates how I am attempting to load the JSON file:
   
   ```ruby
   table = Arrow::Table.load(json_file, format: :json)
   ```
   
   Unfortunately, executing this code results in the following error:
   
   ```bash
   
~/gems/ruby-3.3.0/gems/gobject-introspection-4.2.0/lib/gobject-introspection/loader.rb:705:in
 `invoke': [json-reader][read]: Invalid: straddling object straddles two block 
boundaries (try to increase block size?) (Arrow::Error::Invalid)
        from 
~/gems/ruby-3.3.0/gems/gobject-introspection-4.2.0/lib/gobject-introspection/loader.rb:705:in
 `invoke'
        from 
~/gems/ruby-3.3.0/gems/gobject-introspection-4.2.0/lib/gobject-introspection/loader.rb:573:in
 `read'
        from 
~/gems/ruby-3.3.0/gems/red-arrow-14.0.2/lib/arrow/table-loader.rb:258:in `block 
in load_as_json'
        from 
~/gems/ruby-3.3.0/gems/red-arrow-14.0.2/lib/arrow/table-loader.rb:155:in 
`open_input_stream'
        from 
~/gems/ruby-3.3.0/gems/red-arrow-14.0.2/lib/arrow/table-loader.rb:256:in 
`load_as_json'
        from 
~/gems/ruby-3.3.0/gems/red-arrow-14.0.2/lib/arrow/table-loader.rb:98:in 
`load_by_reader'
        from 
~/.rvm/gems/ruby-3.3.0/gems/red-arrow-14.0.2/lib/arrow/table-loader.rb:76:in 
`load_from_file'
        from 
~/.rvm/gems/ruby-3.3.0/gems/red-arrow-14.0.2/lib/arrow/table-loader.rb:51:in 
`block in load'
        from 
~/.rvm/gems/ruby-3.3.0/gems/red-arrow-14.0.2/lib/arrow/table-loader.rb:49:in 
`each'
        from 
~/gems/ruby-3.3.0/gems/red-arrow-14.0.2/lib/arrow/table-loader.rb:49:in `load'
        from 
~/gems/ruby-3.3.0/gems/red-arrow-14.0.2/lib/arrow/table-loader.rb:26:in `load'
        from ~/gems/ruby-3.3.0/gems/red-arrow-14.0.2/lib/arrow/table.rb:30:in 
`load'
        ....
   ```
   
   After researching the issue, I attempted to address it by adjusting the 
block size using the block_size parameter in the load method:
   
   ```ruby
   block_size = 1024 * 1024 * 1024 # Tried different values
   table = Arrow::Table.load(json_file, format: :json, block_size: block_size)
   ```
   
   However, it seems that the `block_size` parameter might not be honored by 
the `load` method, as suggested by my analysis of the source code.
   
   Therefore, I would like to seek your guidance on whether there is an 
existing method to adjust the block size when reading files with the 
`red-arrow` gem. If this feature is not currently supported, I kindly request 
consideration for the addition of a feature that either dynamically handles the 
block size internally or exposes an API to adjust the block size.
   
   
   
   ### Component(s)
   
   Ruby


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to