sonomatechDS commented on issue #15056: URL: https://github.com/apache/arrow/issues/15056#issuecomment-1380796248
@tnederlof that's a great thought! I had previously considered this option, but our full dataset takes a while (~8-10 s) to load via arrow::open_dataset() (possibly because of its size/number of partitions?). It would decrease usability if a user had to wait each time an input was changed. Am I understanding correctly that this would be the case? However, this idea could be viable if we abandon loading the full dataset each time and instead use inputs to build a more specific uri -- i.e. a subset of the dataset (e.g. calling open_dataset() on 's3://bucket/sitecode=xxxx/parameter=xxxx/...', or even just read_arrow_csv() on a uri pointing to the desired csv). I'll do some testing on this today, and please let me know if you foresee any issues in the meantime. I appreciate your time and input! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org