[GitHub] [carbondata] xubo245 opened a new pull request #3617: [CARBONDATA-3695] Integrating deep learning framework PyTorch

GitBox Wed, 12 Feb 2020 07:46:20 -0800

xubo245 opened a new pull request #3617: [CARBONDATA-3695]  Integrating deep 
learning framework PyTorch
URL: https://github.com/apache/carbondata/pull/3617
 
 
   Apache CarbonData should provides python interface to support deep learning 
framework PyTorch to ready and write data from/to CarbonData
   
    ### Why is this PR needed?
    
   Nowadays AI model training is getting more and more popular. Currently many 
AI framework uses raw data files or row format data files for model training, 
it could not provide projection, filtering, and fast scan capability like in 
columnar store. So, if CarbonData supports AI framework, it can speed up model 
training by increase IO throughput, and provide more flexible training set 
selection ability to AI developers
    
    ### What changes were proposed in this PR?
   
   https://github.com/apache/carbondata/pull/3479 already provide Basic 
framework for it and integrated TensorFlow. This PR integrates PyTorch and 
provide new interface for it.
   
    ### Does this PR introduce any user interface change?
    - Yes
   
   ```
   def make_data_loader(reader, batch_size=1, 
collate_fn=decimal_friendly_collate):
     """
     Initializes a data loader object, with a default collate.
   
     Number of epochs is defined by the configuration of the reader argument.
   
     :param reader: PyCarbon Reader instance
     :param batch_size: the number of items to return per batch; factored into 
the len() of this reader
     :param collate_fn: an optional callable to merge a list of samples to form 
a mini-batch.
     """
   ```
   
    ### Is any new testcase added?
    - Yes
   
   pytorch_example_carbon_unified_api.py . and so on.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

[GitHub] [carbondata] xubo245 opened a new pull request #3617: [CARBONDATA-3695] Integrating deep learning framework PyTorch

Reply via email to