eeroel commented on PR #37868:
URL: https://github.com/apache/arrow/pull/37868#issuecomment-1913513875

   > Should this fail with bad file_size values? I tried changing the file_size 
to -10000 and it still succeed for me. I am not sure if GCS uses this 
information though (I am using fsspec for the filesystem which uses the gcsfs 
library). I think I am using it wrong because I cannot get it to fail in 
general, regardless of whether the size I input is the real size of the file.
   
   Did you also try to create a dataset with those fragments and read it? 
There's no validation when the fragments are constructed, but it should fail 
when the parquet reader tries to start reading the file, in here: 
https://github.com/apache/arrow/blob/21ffd82c05c93b873ae3c27128eb8604ed0c735f/cpp/src/parquet/file_reader.cc#L476.
 It would make sense to handle zero and negative sizes on the Python side 
though...
   
   Regarding fsspec, the file size information will only get used for Arrow 
internal file system implementations, and I believe currently it's only used 
for S3.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to