jorisvandenbossche commented on issue #35301:
URL: https://github.com/apache/arrow/issues/35301#issuecomment-1523022843

   Given you want a positional delete, would this rather be a "take" operation 
than a "filter". I know this is essentially the same (under the hood, filter an 
array also does a "take" of the required values), but conceptually for a 
Dataset this might be different. A filter can be defined with an expression, 
but a "take" is always with actual materialized values. And so we already have 
a `Dataset.take()` method that does that.
   
   So even if you start with a boolean filter, you should be able to already 
use `take()` by converting the boolean mask to indices with 
`pyarrow.compute.indices_nonzero`. 
   
   I am not fully sure how `Scanner::TakeRows` works given that positional 
indices depend on the order that data is scanned. I assume it follows the order 
of the actual vector of fragments.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to