Hey, So, I want through both the libraries we considered for `csv parsers` I implemented code to load the data from a small example `csv` file to arma::mat, here is the sample code, let me know what you think. I am loading into wrong in arma::mat? Can there be any other efficient way?
Fast CSV Parser <https://github.com/ben-strasser/fast-cpp-csv-parser> io::CSVReader<4> in("llog.csv"); float a, b, c, d; int row = 0; arma::mat data(20, 4); while(in.read_row(a, b, c, d)){ data(row, 0) = a; data(row, 1) = b; data(row, 2) = c; data(row, 3) = d; row++; } Rapid.csv <https://github.com/d99kris/rapidcsv> // For headerless csv files rapidcsv::Document doc("llog.csv", rapidcsv::LabelParams(-1, -1)); arma::mat data(doc.GetRowCount(), doc.GetColumnCount(), arma::fill::ones); std::vector<float> col; for(int i = 0; i < doc.GetRowCount(); i++) { col = doc.GetRow<float>(i); for(int j = 0; j < doc.GetColumnCount(); j++) { data(i, j) = col[j]; } } After using both a I feel like `rapid.csv` is easier to grasp and work on and seemed more structured. Let me know your thoughts. Also If loading like the above example is file, this can be converted into a function that can act as basic csv file loading in arma::mat, right? Thank You, Gopi On Mon, Mar 29, 2021 at 8:28 PM Omar Shrit <[email protected]> wrote: > Hey Gopi > > On 03/29, Gopi Manohar Tatiraju wrote: > > Hey, > > > > I agree, after going a bit through both the candidates I can see we can > > unload a lot of work by using a well-implemented existing parser. > > I think I should start by comparing both the mentioned libraries to > decide > > which one to use. I will use the same benchmark strategy that > > was discussed in the issue. Does that sound good? > > Sounds good to me. > > > And also I think I can work on replacing boost spirits in GSoC then. This > > will be a start to the data frame idea. Even if we are left with time > > after this, I can start the work on the data frame as well. Is it > > considerable? > > Yes of course. > > > Thanks, > > Gopi > > > > > > On Mon, Mar 29, 2021 at 7:33 PM Omar Shrit <[email protected]> wrote: > > > > > Hey Gopi, > > > > > > I totally agree with Ryan, using existing parser will accelerate the > > > project and allow to move forward with the dataframe class. Also, I > > > do believe that replacing boost Spirit with an existing parser will > take > > > a considerable amount of the summer. > > > > > > Thanks, > > > > > > Omar > > > > > > On 03/29, Ryan Curtin wrote: > > > > On Mon, Mar 29, 2021 at 04:17:35PM +0530, Gopi Manohar Tatiraju > wrote: > > > > > Would love to hear your thoughts on whether to go with an already > > > > > implemented parser or build a new one. Also if we are planning to > > > build a > > > > > data frame here then > > > > > maybe going with an in-house parser would be better as we will > have the > > > > > ability to design it in such a way that it can extend maximum > support > > > to > > > > > the new data frame > > > > > which we are planning to build ahead. > > > > > > > > Hey Gopi, > > > > > > > > Honestly I think it's best to use another package. Not only will > this > > > > free up time to actually work on the dataframe class, but also it > means > > > > we are not responsible for maintenance of the CSV parser. There are > > > > lots of little complexities and edge cases in parsing (not to mention > > > > efficiency!) and so we can probably get a lot more bang for our buck > > > > here by using an implementation from someone who has already put down > > > > the time to consider all those details. > > > > > > > > Hope this is helpful. :) > > > > > > > > Thanks, > > > > > > > > Ryan > > > > > > > > -- > > > > Ryan Curtin | "Kill them, Machine... kill them all." > > > > [email protected] | - Dino Velvet > > > >
_______________________________________________ mlpack mailing list [email protected] http://knife.lugatgt.org/cgi-bin/mailman/listinfo/mlpack
