piotrkai commented on issue #40196:
URL: https://github.com/apache/arrow/issues/40196#issuecomment-2499587477
It does not work with columns either. I am trying to skip first N rows in a
huge dataset, but I cannot get it work using pyarrow.
Example code:
```
scan_columns['column_name' = pc.field('column_name')
scan_columns['__exp_index'] =
pc.add(pc.multiply(pc.field('__fragment_index')), pc.scalar(10000)),
pc.field('__batch_index'))
filter = None
if self.start_dataset_index > 0:
filter = pc.field('__exp_index') >= pc.scalar(self.start_dataset_index)
self.start_dataset_index = 0
batches = self.dataset.scanner(columns=scan_columns, filter=filter,
batch_size=self.batch_size).to_batches()
```
I am getting pretty much the same error:
```
File "pyarrow/_dataset.pyx", line 399, in pyarrow._dataset.Dataset.scanner
File "pyarrow/_dataset.pyx", line 3557, in
pyarrow._dataset.Scanner.from_dataset
File "pyarrow/_dataset.pyx", line 3475, in
pyarrow._dataset.Scanner._make_scan_options
File "pyarrow/_dataset.pyx", line 3422, in
pyarrow._dataset._populate_builder
File "pyarrow/error.pxi", line 92, in pyarrow.lib.check_status
pyarrow.lib.ArrowInvalid: No match for FieldRef.Name(__fragment_index) in
column_name: int64
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]