amoeba commented on issue #38724: URL: https://github.com/apache/arrow/issues/38724#issuecomment-1817307831
Hi @cboettig, thanks for writing this up. This is also the behavior of PyArrow so I expect this applies to the C++ implementation as well. Generally, I think what you're asking for is probably more likely to be what the user expects than what is done now. In my mind, a Dataset has one Schema and it shouldn't matter where you root your Dataset instance. And making the user go through extra hoops to re-add the dropped partitions with `add_filename` is cumbersome. That said, I think this would represent a breaking change so some discussion from other devs here would be good here, particularly about whether this change would be a good new default, whether we might retain the current behavior with a flag, and whether it makes sense to make all the C++ implementations consistent. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
