xubo245 opened a new pull request #3617: [CARBONDATA-3695] Integrating deep learning framework PyTorch URL: https://github.com/apache/carbondata/pull/3617 Apache CarbonData should provides python interface to support deep learning framework PyTorch to ready and write data from/to CarbonData ### Why is this PR needed? Nowadays AI model training is getting more and more popular. Currently many AI framework uses raw data files or row format data files for model training, it could not provide projection, filtering, and fast scan capability like in columnar store. So, if CarbonData supports AI framework, it can speed up model training by increase IO throughput, and provide more flexible training set selection ability to AI developers ### What changes were proposed in this PR? https://github.com/apache/carbondata/pull/3479 already provide Basic framework for it and integrated TensorFlow. This PR integrates PyTorch and provide new interface for it. ### Does this PR introduce any user interface change? - Yes ``` def make_data_loader(reader, batch_size=1, collate_fn=decimal_friendly_collate): """ Initializes a data loader object, with a default collate. Number of epochs is defined by the configuration of the reader argument. :param reader: PyCarbon Reader instance :param batch_size: the number of items to return per batch; factored into the len() of this reader :param collate_fn: an optional callable to merge a list of samples to form a mini-batch. """ ``` ### Is any new testcase added? - Yes pytorch_example_carbon_unified_api.py . and so on.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
