kazantsev-maksim opened a new pull request, #3044:
URL: https://github.com/apache/datafusion-comet/pull/3044

   ## Which issue does this PR close?
   
   - N/A
   
   ## Rationale for this change
   
   Added an experimental implementation of native CSV file reading (currently 
only for DataSourceV2 version)
   
   Required improvements:
   
   1. Conduct more benchmark tests
   2. Try to implement the idea from - 
https://github.com/apache/datafusion-comet/issues/882
   3. Test reading files from S3/HDFS (currently only tested on local files)
   
   Results of simple benchmark test (1 iteration)
   
   Running benchmark: Native csv read: orders table
     Running case: Spark
     Stopped after 1 iterations, 238272 ms
     Running case: Comet
     Stopped after 1 iterations, 55326 ms
   
   OpenJDK 64-Bit Server VM 11.0.26+4-LTS on Mac OS X 26.2
   Apple M1 Pro
   Native csv read: orders table:            Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   
------------------------------------------------------------------------------------------------------------------------
   Spark                                            238272         238272       
    0          0,0      113617,0       1,0X
   Comet                                             55327          55327       
    0          0,0       26381,9       4,3X
   
   Running benchmark: Native csv read: region table
     Running case: Spark
     Stopped after 1 iterations, 36 ms
     Running case: Comet
     Stopped after 1 iterations, 38 ms
   
   OpenJDK 64-Bit Server VM 11.0.26+4-LTS on Mac OS X 26.2
   Apple M1 Pro
   Native csv read: region table:            Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   
------------------------------------------------------------------------------------------------------------------------
   Spark                                                36             36       
    0         57,7          17,3       1,0X
   Comet                                                39             39       
    0         54,0          18,5       0,9X
   
   Running benchmark: Native csv read: nation table
     Running case: Spark
     Stopped after 1 iterations, 35 ms
     Running case: Comet
     Stopped after 1 iterations, 39 ms
   
   OpenJDK 64-Bit Server VM 11.0.26+4-LTS on Mac OS X 26.2
   Apple M1 Pro
   Native csv read: nation table:            Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   
------------------------------------------------------------------------------------------------------------------------
   Spark                                                36             36       
    0         59,0          16,9       1,0X
   Comet                                                39             39       
    0         53,4          18,7       0,9X
   
   Running benchmark: Native csv read: part table
     Running case: Spark
     Stopped after 1 iterations, 28883 ms
     Running case: Comet
     Stopped after 1 iterations, 7870 ms
   
   OpenJDK 64-Bit Server VM 11.0.26+4-LTS on Mac OS X 26.2
   Apple M1 Pro
   Native csv read: part table:              Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   
------------------------------------------------------------------------------------------------------------------------
   Spark                                             28884          28884       
    0          0,1       13773,0       1,0X
   Comet                                              7871           7871       
    0          0,3        3753,2       3,7X
   
   Running benchmark: Native csv read: supplier table
     Running case: Spark
     Stopped after 1 iterations, 1380 ms
     Running case: Comet
     Stopped after 1 iterations, 413 ms
   
   OpenJDK 64-Bit Server VM 11.0.26+4-LTS on Mac OS X 26.2
   Apple M1 Pro
   Native csv read: supplier table:          Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   
------------------------------------------------------------------------------------------------------------------------
   Spark                                              1381           1381       
    0          1,5         658,3       1,0X
   Comet                                               414            414       
    0          5,1         197,2       3,3X
   
   Running benchmark: Native csv read: partsupp table
     Running case: Spark
     Stopped after 1 iterations, 107904 ms
     Running case: Comet
     Stopped after 1 iterations, 26308 ms
   
   OpenJDK 64-Bit Server VM 11.0.26+4-LTS on Mac OS X 26.2
   Apple M1 Pro
   Native csv read: partsupp table:          Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   
------------------------------------------------------------------------------------------------------------------------
   Spark                                            107905         107905       
    0          0,0       51452,9       1,0X
   Comet                                             26308          26308       
    0          0,1       12544,8       4,1X
   
   Running benchmark: Native csv read: customer table
     Running case: Spark
     Stopped after 1 iterations, 22288 ms
     Running case: Comet
     Stopped after 1 iterations, 6089 ms
   
   OpenJDK 64-Bit Server VM 11.0.26+4-LTS on Mac OS X 26.2
   Apple M1 Pro
   Native csv read: customer table:          Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   
------------------------------------------------------------------------------------------------------------------------
   Spark                                             22289          22289       
    0          0,1       10628,0       1,0X
   Comet                                              6090           6090       
    0          0,3        2903,8       3,7X
   
   Running benchmark: Native csv read: lineitem table
     Running case: Spark
     Stopped after 1 iterations, 1331928 ms
     Running case: Comet
     Stopped after 1 iterations, 342292 ms
   
   OpenJDK 64-Bit Server VM 11.0.26+4-LTS on Mac OS X 26.2
   Apple M1 Pro
   Native csv read: lineitem table:          Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   
------------------------------------------------------------------------------------------------------------------------
   Spark                                           1331928        1331928       
    0          0,0      635112,8       1,0X
   Comet                                            342293         342293       
    0          0,0      163218,0       3,9X
   
   ## How are these changes tested?
   
   1. Added new unit test
   2. Added simple benchmark test
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to