jorisvandenbossche commented on issue #35301: URL: https://github.com/apache/arrow/issues/35301#issuecomment-1523022843
Given you want a positional delete, would this rather be a "take" operation than a "filter". I know this is essentially the same (under the hood, filter an array also does a "take" of the required values), but conceptually for a Dataset this might be different. A filter can be defined with an expression, but a "take" is always with actual materialized values. And so we already have a `Dataset.take()` method that does that. So even if you start with a boolean filter, you should be able to already use `take()` by converting the boolean mask to indices with `pyarrow.compute.indices_nonzero`. I am not fully sure how `Scanner::TakeRows` works given that positional indices depend on the order that data is scanned. I assume it follows the order of the actual vector of fragments. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
