kevingurney opened a new pull request, #37773:
URL: https://github.com/apache/arrow/pull/37773

   ### Rationale for this change
   
   To enable initial CSV I/O support, this PR adds `arrow.io.csv.TableReader` 
and `arrow.io.csv.TableWriter` MATLAB classes which work with 
`arrow.tabular.Table`s to the MATLAB interface.
   
   ### What changes are included in this PR?
   
   1. Added a new `arrow.io.csv.TableReader` class
   2. Added a new `arrow.io.csv.TableWriter` class
   
   **Example**
   ```matlab
   >> matlabTableWrite = array2table(rand(3))
   
   matlabTableWrite =
   
     3×3 table
   
        Var1        Var2       Var3  
       _______    ________    _______
   
       0.91131    0.091595    0.24594
       0.51315     0.27368    0.62119
       0.42942     0.88665    0.49501
   
   >> arrowTableWrite = arrow.table(matlabTableWrite)
   
   arrowTableWrite = 
   
   Var1: double
   Var2: double
   Var3: double
   ----
   Var1:
     [
       [
         0.9113083542736461,
         0.5131490075412158,
         0.42942202968065213
       ]
     ]
   Var2:
     [
       [
         0.09159480217154525,
         0.27367730380496647,
         0.8866478145458545
       ]
     ]
   Var3:
     [
       [
         0.2459443412735529,
         0.6211893868708748,
         0.49500739584280073
       ]
     ]
   
   >> writer = arrow.io.csv.TableWriter("example.csv")
   
   writer = 
   
     TableWriter with properties:
   
       Filename: "example.csv"
   
   >> writer.write(arrowTableWrite)
   
   >> reader = arrow.io.csv.TableReader("example.csv")
   
   reader = 
   
     TableReader with properties:
   
       Filename: "example.csv"
   
   >> arrowTableRead = reader.read()
   
   arrowTableRead = 
   
   Var1: double
   Var2: double
   Var3: double
   ----
   Var1:
     [
       [
         0.9113083542736461,
         0.5131490075412158,
         0.42942202968065213
       ]
     ]
   Var2:
     [
       [
         0.09159480217154525,
         0.27367730380496647,
         0.8866478145458545
       ]
     ]
   Var3:
     [
       [
         0.2459443412735529,
         0.6211893868708748,
         0.49500739584280073
       ]
     ]
   
   >> matlabTableRead = table(arrowTableRead)
   
   matlabTableRead =
   
     3×3 table
   
        Var1        Var2       Var3  
       _______    ________    _______
   
       0.91131    0.091595    0.24594
       0.51315     0.27368    0.62119
       0.42942     0.88665    0.49501
   
   >> isequal(arrowTableRead, arrowTableWrite)
   
   ans =
   
     logical
   
      1
   
   >> isequal(matlabTableRead, matlabTableWrite)
   
   ans =
   
     logical
   
      1
   ```
   
   ### Are these changes tested?
   
   Yes.
   
   1. Added new CSV I/O tests including `test/arrow/io/csv/tRoundTrip.m` and 
`test/arrow/io/csv/tError.m`.
   2. Both of these test classes inherit from a `CSVTest` superclass.
   
   ### Are there any user-facing changes?
   
   Yes.
   
   1. Users can now read and write CSV files using `arrow.io.csv.TableReader` 
and `arrow.io.csv.TableWriter`.
   
   ### Future Directions
   
   1. Expose 
[options](https://github.com/apache/arrow/blob/main/cpp/src/arrow/csv/options.h)
 for controlling CSV reading and writing in MATLAB.
   2. Add more read/write tests for null value handling and other datatypes 
beyond numeric and string values.
   4. Add a `RecordBatchReader` and `RecordBatchWriter` for CSV.
   5. Add support for more I/O formats like Parquet, JSON, ORC, Arrow IPC, etc.
   
   ### Notes
   
   1. Thank you @sgilmore10 for your help with this pull request!
   2. I chose to add both the `TableReader` and `TableWriter` in one pull 
request because it simplified testing. My apologies for the slightly lengthy 
pull request.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to