Hello Gopi, Thank you for starting the benchmark, would it be possible to plot the log and add the results to the open pull request to get a better comparison?
The code seems to be fine, it can be optimized, but I would wait to see the plots. Thanks, Omar On 04/01, Gopi Manohar Tatiraju wrote: > Hey Omar, > > Sorry, it took longer. I was running benchmark code since this morning and > it took a lot of time as my system is a bit slow. > I compared the default armadillo parser, mlpack's custom parser, and > rapidcsv. > > Can you verify the code I used? I might have done something wrong and it > took a lot of time to run this code, but that is maybe due to the fact that > my system is not that powerful. > *Link to the repo and log file:* > https://github.com/heisenbuug/Benchmark-CSV-Parsers > > In the meantime, I will also start working on my draft proposal a bit, and > once we do this testing we can use those results to decide our > plan of action. Let me know if you have any suggestions or points for the > draft proposal. > > Thank you, > Gopi M. Tatiraju > > > On Thu, Apr 1, 2021 at 5:59 PM Omar Shrit <o...@shrit.me> wrote: > > > Hello Gopi, > > > > Would it be possible to do some benchmark for these two and compare > > them with already existing Boost Spirit. If there is a considerable > > difference > > in performance between these two parsers, then the obvious choice will > > be for the faster one. I know that both of them are called (fast, rapid) > > but I did not see any benchmark yet to know which one is faster. > > > > Let me know what do you think, the benchmark will help us in doing better > > choice, since this is the internal (private) API, and will not be used > > by the user directly. > > > > These are my thoughts, let me know what do you think. > > > > Omar. > > > > On 04/01, Gopi Manohar Tatiraju wrote: > > > Hey, > > > > > > So, I want through both the libraries we considered for `csv parsers` > > > I implemented code to load the data from a small example `csv` file > > > to arma::mat, here is the sample code, let me know what you think. > > > I am loading into wrong in arma::mat? Can there be any other efficient > > > way? > > > > > > Fast CSV Parser <https://github.com/ben-strasser/fast-cpp-csv-parser> > > > io::CSVReader<4> in("llog.csv"); > > > float a, b, c, d; > > > int row = 0; > > > arma::mat data(20, 4); > > > > > > while(in.read_row(a, b, c, d)){ > > > data(row, 0) = a; > > > data(row, 1) = b; > > > data(row, 2) = c; > > > data(row, 3) = d; > > > row++; > > > } > > > > > > Rapid.csv <https://github.com/d99kris/rapidcsv> > > > // For headerless csv files > > > rapidcsv::Document doc("llog.csv", rapidcsv::LabelParams(-1, -1)); > > > arma::mat data(doc.GetRowCount(), doc.GetColumnCount(), > > arma::fill::ones); > > > > > > std::vector<float> col; > > > for(int i = 0; i < doc.GetRowCount(); i++) > > > { > > > col = doc.GetRow<float>(i); > > > for(int j = 0; j < doc.GetColumnCount(); j++) > > > { > > > data(i, j) = col[j]; > > > } > > > } > > > > > > After using both a I feel like `rapid.csv` is easier to grasp and work on > > > and seemed more structured. > > > Let me know your thoughts. Also If loading like the above example is > > file, > > > this can be converted > > > into a function that can act as basic csv file loading in arma::mat, > > right? > > > > > > Thank You, > > > Gopi > > > > > > On Mon, Mar 29, 2021 at 8:28 PM Omar Shrit <o...@shrit.me> wrote: > > > > > > > Hey Gopi > > > > > > > > On 03/29, Gopi Manohar Tatiraju wrote: > > > > > Hey, > > > > > > > > > > I agree, after going a bit through both the candidates I can see we > > can > > > > > unload a lot of work by using a well-implemented existing parser. > > > > > I think I should start by comparing both the mentioned libraries to > > > > decide > > > > > which one to use. I will use the same benchmark strategy that > > > > > was discussed in the issue. Does that sound good? > > > > > > > > Sounds good to me. > > > > > > > > > And also I think I can work on replacing boost spirits in GSoC then. > > This > > > > > will be a start to the data frame idea. Even if we are left with time > > > > > after this, I can start the work on the data frame as well. Is it > > > > > considerable? > > > > > > > > Yes of course. > > > > > > > > > Thanks, > > > > > Gopi > > > > > > > > > > > > > > > On Mon, Mar 29, 2021 at 7:33 PM Omar Shrit <o...@shrit.me> wrote: > > > > > > > > > > > Hey Gopi, > > > > > > > > > > > > I totally agree with Ryan, using existing parser will accelerate > > the > > > > > > project and allow to move forward with the dataframe class. Also, I > > > > > > do believe that replacing boost Spirit with an existing parser will > > > > take > > > > > > a considerable amount of the summer. > > > > > > > > > > > > Thanks, > > > > > > > > > > > > Omar > > > > > > > > > > > > On 03/29, Ryan Curtin wrote: > > > > > > > On Mon, Mar 29, 2021 at 04:17:35PM +0530, Gopi Manohar Tatiraju > > > > wrote: > > > > > > > > Would love to hear your thoughts on whether to go with an > > already > > > > > > > > implemented parser or build a new one. Also if we are planning > > to > > > > > > build a > > > > > > > > data frame here then > > > > > > > > maybe going with an in-house parser would be better as we will > > > > have the > > > > > > > > ability to design it in such a way that it can extend maximum > > > > support > > > > > > to > > > > > > > > the new data frame > > > > > > > > which we are planning to build ahead. > > > > > > > > > > > > > > Hey Gopi, > > > > > > > > > > > > > > Honestly I think it's best to use another package. Not only will > > > > this > > > > > > > free up time to actually work on the dataframe class, but also it > > > > means > > > > > > > we are not responsible for maintenance of the CSV parser. There > > are > > > > > > > lots of little complexities and edge cases in parsing (not to > > mention > > > > > > > efficiency!) and so we can probably get a lot more bang for our > > buck > > > > > > > here by using an implementation from someone who has already put > > down > > > > > > > the time to consider all those details. > > > > > > > > > > > > > > Hope this is helpful. :) > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > > > > Ryan > > > > > > > > > > > > > > -- > > > > > > > Ryan Curtin | "Kill them, Machine... kill them all." > > > > > > > r...@ratml.org | - Dino Velvet > > > > > > > > > > > >
signature.asc
Description: PGP signature
_______________________________________________ mlpack mailing list mlpack@lists.mlpack.org http://knife.lugatgt.org/cgi-bin/mailman/listinfo/mlpack