SChakravorti21 commented on issue #34535:
URL: https://github.com/apache/arrow/issues/34535#issuecomment-1974945525

   > If it were to go public, that would be done in another issue/PR pair, so 
for now it's better to focus just on making `ChunkResolver` public.
   
   Gotcha. Yes, I agree, we can focus on just `ChunkResolver` for now!
   
   > [1] Apache Arrow, as a columnar data library, is built to keep most 
computation on top of columnar representation, you're of course allowed to do 
whatever you need to solver your application problem, but Arrow itself exposing 
row-by-row APIs would pass the wrong message.
   
   I agree 100% that we don't want to send the wrong message, and that people 
should always try to frame their logic in terms of vectorized operations. That 
said, there are practical use-cases where there is no way of getting around 
row-major processing of the data.
   
   I've been thinking about this and came up with an alternative way that may 
be better, and would be interested to hear your thoughts on it (pseudocode):
   
   ```cpp
   for (auto maybe_batch : arrow::TableBatchReader(*table))
   {
     std::shared_ptr<arrow::RecordBatch> batch = maybe_batch.ValueOrDie();
   
     // User decides whether they want a safe or unsafe cast
     std::shared_ptr<arrow::Int64Array> a = 
std::dynamic_pointer_cast<arrow::Int64Array>(batch->GetColumnByName("a"));
     std::shared_ptr<arrow::StringArray> b = 
std::dynamic_pointer_cast<arrow::StringArray>(batch->GetColumnByName("b"));
   
     for (int i = 0; i < batch->num_rows(); ++i) {
       do_business_logic(a.Value(i), b.Value(i));
     }
   }
   ```
   
   This is still sequential but (I think) avoids a lot of unnecessary overhead. 
In that case, it might be good enough to make 
`ChunkResolver`/`ChunkedArrayResolver` public only for the sake of use-cases 
that require random access into `ChunkedArray`s. We can also document that 
these classes should never be used for pure sequential access, and something 
like above should be preferred instead. How does that sound?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to