ldacey commented on PR #37868:
URL: https://github.com/apache/arrow/pull/37868#issuecomment-1913406639

   Should this fail with bad file_size values? I tried changing the file_size 
to -10000 and it still succeed for me. I am not sure if GCS uses this 
information though (I am using fsspec for the filesystem which uses the gcsfs 
library). I think I am using it wrong because I cannot get it to fail in 
general, regardless of whether the size I input is the real size of the file.
   
   I didn't see a fragment.size attribute or anything to check other than 
fragment.metadata.serialized_size which is different 
   ```python
   fragment = file_format.make_fragment(path, filesystem=dataset.filesystem, 
partition_expression=expression, file_size=-10000)
   print(fragment)
   
   <pyarrow.dataset.ParquetFileFragment 
path=bucket/discard/year=2023/part-0.parquet partition=[year=2023]>
   <pyarrow.dataset.ParquetFileFragment 
path=bucket/discard/year=2024/part-0.parquet partition=[year=2024]>
   
   # fresh dataset written with version 15.0
   <pyarrow._parquet.FileMetaData object at 0x7f725816ecf0>
     created_by: parquet-cpp-arrow version 15.0.0
     num_columns: 56
     num_rows: 21420
     num_row_groups: 5
     format_version: 2.6
     serialized_size: 34164
   
   ```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to