etseidl commented on issue #10243: URL: https://github.com/apache/arrow-rs/issues/10243#issuecomment-4860980750
Hi @punkeel, thanks for the report. As you allude above, this is not really a bug except in the sense that the error message is not clear about what's going on. The bug is rather in whatever produced these files. In general, I'm fine with tolerating bad writer behavior in cases where the specification is vague. But in this case the spec is quite clear that use of V2 page headers or the page indexes requires pages to begin at row/record boundaries. The page indexes do not work otherwise. I think my preference here would be to by default error when non-compliant pages are detected. A more useful error message would certainly be welcome. I'd also be ok with an opt-in reader setting to relax the requirement to start at a row boundary. This way the user knows the file is problematic and not suitable for pushdowns that would skip pages, but leaves the file readable when pruning is not desired. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
