ldacey commented on PR #37868:
URL: https://github.com/apache/arrow/pull/37868#issuecomment-1913406639
Should this fail with bad file_size values? I tried changing the file_size
to -10000 and it still succeed for me. I am not sure if GCS uses this
information though (I am using fsspec for the filesystem which uses the gcsfs
library). I think I am using it wrong because I cannot get it to fail in
general, regardless of whether the size I input is the real size of the file.
I didn't see a fragment.size attribute or anything to check other than
fragment.metadata.serialized_size which is different
```python
fragment = file_format.make_fragment(path, filesystem=dataset.filesystem,
partition_expression=expression, file_size=-10000)
print(fragment)
<pyarrow.dataset.ParquetFileFragment
path=bucket/discard/year=2023/part-0.parquet partition=[year=2023]>
<pyarrow.dataset.ParquetFileFragment
path=bucket/discard/year=2024/part-0.parquet partition=[year=2024]>
# fresh dataset written with version 15.0
<pyarrow._parquet.FileMetaData object at 0x7f725816ecf0>
created_by: parquet-cpp-arrow version 15.0.0
num_columns: 56
num_rows: 21420
num_row_groups: 5
format_version: 2.6
serialized_size: 34164
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]