ashishbista opened a new issue, #39433:
URL: https://github.com/apache/arrow/issues/39433
### Describe the bug, including details regarding any error messages,
version, and platform.
I am currently using the red-arrow gem to process JSON files within my Ruby
application. While the gem works seamlessly with smaller JSON files, I have
encountered an issue when attempting to process a larger JSON file with a size
of 2.3MB.
The code snippet below illustrates how I am attempting to load the JSON file:
```ruby
table = Arrow::Table.load(json_file, format: :json)
```
Unfortunately, executing this code results in the following error:
```bash
~/gems/ruby-3.3.0/gems/gobject-introspection-4.2.0/lib/gobject-introspection/loader.rb:705:in
`invoke': [json-reader][read]: Invalid: straddling object straddles two block
boundaries (try to increase block size?) (Arrow::Error::Invalid)
from
~/gems/ruby-3.3.0/gems/gobject-introspection-4.2.0/lib/gobject-introspection/loader.rb:705:in
`invoke'
from
~/gems/ruby-3.3.0/gems/gobject-introspection-4.2.0/lib/gobject-introspection/loader.rb:573:in
`read'
from
~/gems/ruby-3.3.0/gems/red-arrow-14.0.2/lib/arrow/table-loader.rb:258:in `block
in load_as_json'
from
~/gems/ruby-3.3.0/gems/red-arrow-14.0.2/lib/arrow/table-loader.rb:155:in
`open_input_stream'
from
~/gems/ruby-3.3.0/gems/red-arrow-14.0.2/lib/arrow/table-loader.rb:256:in
`load_as_json'
from
~/gems/ruby-3.3.0/gems/red-arrow-14.0.2/lib/arrow/table-loader.rb:98:in
`load_by_reader'
from
~/.rvm/gems/ruby-3.3.0/gems/red-arrow-14.0.2/lib/arrow/table-loader.rb:76:in
`load_from_file'
from
~/.rvm/gems/ruby-3.3.0/gems/red-arrow-14.0.2/lib/arrow/table-loader.rb:51:in
`block in load'
from
~/.rvm/gems/ruby-3.3.0/gems/red-arrow-14.0.2/lib/arrow/table-loader.rb:49:in
`each'
from
~/gems/ruby-3.3.0/gems/red-arrow-14.0.2/lib/arrow/table-loader.rb:49:in `load'
from
~/gems/ruby-3.3.0/gems/red-arrow-14.0.2/lib/arrow/table-loader.rb:26:in `load'
from ~/gems/ruby-3.3.0/gems/red-arrow-14.0.2/lib/arrow/table.rb:30:in
`load'
....
```
After researching the issue, I attempted to address it by adjusting the
block size using the block_size parameter in the load method:
```ruby
block_size = 1024 * 1024 * 1024 # Tried different values
table = Arrow::Table.load(json_file, format: :json, block_size: block_size)
```
However, it seems that the `block_size` parameter might not be honored by
the `load` method, as suggested by my analysis of the source code.
Therefore, I would like to seek your guidance on whether there is an
existing method to adjust the block size when reading files with the
`red-arrow` gem. If this feature is not currently supported, I kindly request
consideration for the addition of a feature that either dynamically handles the
block size internally or exposes an API to adjust the block size.
### Component(s)
Ruby
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]