sonomatechDS commented on issue #15056:
URL: https://github.com/apache/arrow/issues/15056#issuecomment-1380796248

   @tnederlof that's a great thought! I had previously considered this option, 
but our full dataset takes a while (~8-10 s) to load via arrow::open_dataset() 
(possibly because of its size/number of partitions?). It would decrease 
usability if a user had to wait each time an input was changed. Am I 
understanding correctly that this would be the case?
   
   However, this idea could be viable if we abandon loading the full dataset 
each time and instead use inputs to build a more specific uri -- i.e. a subset 
of the dataset (e.g. calling open_dataset() on 
's3://bucket/sitecode=xxxx/parameter=xxxx/...', or even just read_arrow_csv() 
on a uri pointing to the desired csv).
   
   I'll do some testing on this today, and please let me know if you foresee 
any issues in the meantime.
   
   I appreciate your time and input!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to