Great, can not wait to see the results. I did not have a chance to look at the API of the fast cpp, there is no examples or clear doc, it requires reading the code directly. I will let you know if I have anything.
Best, Omar On 04/02, Gopi Manohar Tatiraju wrote: > Heyy, > > So I was not able to work out fast csv, but I edited the existing code to > read the whole data column-wise, > each column is returned to us as a std::vector which I then converted to > arma::vec and then at the end > insert the column into an arma::mat. > Suggested code changes: > > > > arma::fmat mat(doc.GetRowCount(), doc.GetColumnCount()); > > std::vector<float> column; > > for(int i = 0; i < doc.GetColumnCount(); i++) > > { > > column = doc.GetColumn<float>(i); > > arma::fvec column_vector(column); > > mat.col(i) = column_vector; > > } > > > I am running the benchmark code, it's gonna take some time, so I will > upload the code finishes compiling. > Also, any idea regarding the other parser would help. > > Thanks, > Gopi > > On Fri, Apr 2, 2021 at 12:47 AM Gopi Manohar Tatiraju <deathcod...@gmail.com> > wrote: > > > Hey, > > > > Was working on it. > > Here's the link: > > https://github.com/heisenbuug/Benchmark-CSV-Parsers/blob/main/csvparser_log_check.ipynb > > > > Thanks, > > Gopi > > > > On Fri, Apr 2, 2021 at 12:28 AM Omar Shrit <o...@shrit.me> wrote: > > > >> Hello Gopi, > >> > >> Thank you for starting the benchmark, would it be possible to plot the > >> log and add the results to the open pull request to get a better > >> comparison? > >> > >> The code seems to be fine, it can be optimized, but I would wait to see > >> the plots. > >> > >> Thanks, > >> > >> Omar > >> > >> On 04/01, Gopi Manohar Tatiraju wrote: > >> > Hey Omar, > >> > > >> > Sorry, it took longer. I was running benchmark code since this morning > >> and > >> > it took a lot of time as my system is a bit slow. > >> > I compared the default armadillo parser, mlpack's custom parser, and > >> > rapidcsv. > >> > > >> > Can you verify the code I used? I might have done something wrong and it > >> > took a lot of time to run this code, but that is maybe due to the fact > >> that > >> > my system is not that powerful. > >> > *Link to the repo and log file:* > >> > https://github.com/heisenbuug/Benchmark-CSV-Parsers > >> > > >> > In the meantime, I will also start working on my draft proposal a bit, > >> and > >> > once we do this testing we can use those results to decide our > >> > plan of action. Let me know if you have any suggestions or points for > >> the > >> > draft proposal. > >> > > >> > Thank you, > >> > Gopi M. Tatiraju > >> > > >> > > >> > On Thu, Apr 1, 2021 at 5:59 PM Omar Shrit <o...@shrit.me> wrote: > >> > > >> > > Hello Gopi, > >> > > > >> > > Would it be possible to do some benchmark for these two and compare > >> > > them with already existing Boost Spirit. If there is a considerable > >> > > difference > >> > > in performance between these two parsers, then the obvious choice will > >> > > be for the faster one. I know that both of them are called (fast, > >> rapid) > >> > > but I did not see any benchmark yet to know which one is faster. > >> > > > >> > > Let me know what do you think, the benchmark will help us in doing > >> better > >> > > choice, since this is the internal (private) API, and will not be used > >> > > by the user directly. > >> > > > >> > > These are my thoughts, let me know what do you think. > >> > > > >> > > Omar. > >> > > > >> > > On 04/01, Gopi Manohar Tatiraju wrote: > >> > > > Hey, > >> > > > > >> > > > So, I want through both the libraries we considered for `csv > >> parsers` > >> > > > I implemented code to load the data from a small example `csv` file > >> > > > to arma::mat, here is the sample code, let me know what you think. > >> > > > I am loading into wrong in arma::mat? Can there be any other > >> efficient > >> > > > way? > >> > > > > >> > > > Fast CSV Parser < > >> https://github.com/ben-strasser/fast-cpp-csv-parser> > >> > > > io::CSVReader<4> in("llog.csv"); > >> > > > float a, b, c, d; > >> > > > int row = 0; > >> > > > arma::mat data(20, 4); > >> > > > > >> > > > while(in.read_row(a, b, c, d)){ > >> > > > data(row, 0) = a; > >> > > > data(row, 1) = b; > >> > > > data(row, 2) = c; > >> > > > data(row, 3) = d; > >> > > > row++; > >> > > > } > >> > > > > >> > > > Rapid.csv <https://github.com/d99kris/rapidcsv> > >> > > > // For headerless csv files > >> > > > rapidcsv::Document doc("llog.csv", rapidcsv::LabelParams(-1, -1)); > >> > > > arma::mat data(doc.GetRowCount(), doc.GetColumnCount(), > >> > > arma::fill::ones); > >> > > > > >> > > > std::vector<float> col; > >> > > > for(int i = 0; i < doc.GetRowCount(); i++) > >> > > > { > >> > > > col = doc.GetRow<float>(i); > >> > > > for(int j = 0; j < doc.GetColumnCount(); j++) > >> > > > { > >> > > > data(i, j) = col[j]; > >> > > > } > >> > > > } > >> > > > > >> > > > After using both a I feel like `rapid.csv` is easier to grasp and > >> work on > >> > > > and seemed more structured. > >> > > > Let me know your thoughts. Also If loading like the above example is > >> > > file, > >> > > > this can be converted > >> > > > into a function that can act as basic csv file loading in arma::mat, > >> > > right? > >> > > > > >> > > > Thank You, > >> > > > Gopi > >> > > > > >> > > > On Mon, Mar 29, 2021 at 8:28 PM Omar Shrit <o...@shrit.me> wrote: > >> > > > > >> > > > > Hey Gopi > >> > > > > > >> > > > > On 03/29, Gopi Manohar Tatiraju wrote: > >> > > > > > Hey, > >> > > > > > > >> > > > > > I agree, after going a bit through both the candidates I can > >> see we > >> > > can > >> > > > > > unload a lot of work by using a well-implemented existing > >> parser. > >> > > > > > I think I should start by comparing both the mentioned > >> libraries to > >> > > > > decide > >> > > > > > which one to use. I will use the same benchmark strategy that > >> > > > > > was discussed in the issue. Does that sound good? > >> > > > > > >> > > > > Sounds good to me. > >> > > > > > >> > > > > > And also I think I can work on replacing boost spirits in GSoC > >> then. > >> > > This > >> > > > > > will be a start to the data frame idea. Even if we are left > >> with time > >> > > > > > after this, I can start the work on the data frame as well. Is > >> it > >> > > > > > considerable? > >> > > > > > >> > > > > Yes of course. > >> > > > > > >> > > > > > Thanks, > >> > > > > > Gopi > >> > > > > > > >> > > > > > > >> > > > > > On Mon, Mar 29, 2021 at 7:33 PM Omar Shrit <o...@shrit.me> > >> wrote: > >> > > > > > > >> > > > > > > Hey Gopi, > >> > > > > > > > >> > > > > > > I totally agree with Ryan, using existing parser will > >> accelerate > >> > > the > >> > > > > > > project and allow to move forward with the dataframe class. > >> Also, I > >> > > > > > > do believe that replacing boost Spirit with an existing > >> parser will > >> > > > > take > >> > > > > > > a considerable amount of the summer. > >> > > > > > > > >> > > > > > > Thanks, > >> > > > > > > > >> > > > > > > Omar > >> > > > > > > > >> > > > > > > On 03/29, Ryan Curtin wrote: > >> > > > > > > > On Mon, Mar 29, 2021 at 04:17:35PM +0530, Gopi Manohar > >> Tatiraju > >> > > > > wrote: > >> > > > > > > > > Would love to hear your thoughts on whether to go with an > >> > > already > >> > > > > > > > > implemented parser or build a new one. Also if we are > >> planning > >> > > to > >> > > > > > > build a > >> > > > > > > > > data frame here then > >> > > > > > > > > maybe going with an in-house parser would be better as we > >> will > >> > > > > have the > >> > > > > > > > > ability to design it in such a way that it can extend > >> maximum > >> > > > > support > >> > > > > > > to > >> > > > > > > > > the new data frame > >> > > > > > > > > which we are planning to build ahead. > >> > > > > > > > > >> > > > > > > > Hey Gopi, > >> > > > > > > > > >> > > > > > > > Honestly I think it's best to use another package. Not > >> only will > >> > > > > this > >> > > > > > > > free up time to actually work on the dataframe class, but > >> also it > >> > > > > means > >> > > > > > > > we are not responsible for maintenance of the CSV parser. > >> There > >> > > are > >> > > > > > > > lots of little complexities and edge cases in parsing (not > >> to > >> > > mention > >> > > > > > > > efficiency!) and so we can probably get a lot more bang for > >> our > >> > > buck > >> > > > > > > > here by using an implementation from someone who has > >> already put > >> > > down > >> > > > > > > > the time to consider all those details. > >> > > > > > > > > >> > > > > > > > Hope this is helpful. :) > >> > > > > > > > > >> > > > > > > > Thanks, > >> > > > > > > > > >> > > > > > > > Ryan > >> > > > > > > > > >> > > > > > > > -- > >> > > > > > > > Ryan Curtin | "Kill them, Machine... kill them all." > >> > > > > > > > r...@ratml.org | - Dino Velvet > >> > > > > > > > >> > > > > > >> > > > >> > >
signature.asc
Description: PGP signature
_______________________________________________ mlpack mailing list mlpack@lists.mlpack.org http://knife.lugatgt.org/cgi-bin/mailman/listinfo/mlpack