etseidl commented on issue #10243:
URL: https://github.com/apache/arrow-rs/issues/10243#issuecomment-4860980750

   Hi @punkeel, thanks for the report. As you allude above, this is not really 
a bug except in the sense that the error message is not clear about what's 
going on. The bug is rather in whatever produced these files. In general, I'm 
fine with tolerating bad writer behavior in cases where the specification is 
vague. But in this case the spec is quite clear that use of V2 page headers or 
the page indexes requires pages to begin at row/record boundaries. The page 
indexes do not work otherwise.
   
   I think my preference here would be to by default error when non-compliant 
pages are detected. A more useful error message would certainly be welcome. I'd 
also be ok with an opt-in reader setting to relax the requirement to start at a 
row boundary. This way the user knows the file is problematic and not suitable 
for pushdowns that would skip pages, but leaves the file readable when pruning 
is not desired.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to