Hey Omar, I did figure it out at last. It took some time though and I am not sure if this is the only method. There can be more efficient ways but I think we can work with this for now.
Check the code, but I am reading it *row-wise* for fast CSV. *Code file*: https://github.com/heisenbuug/Benchmark-CSV-Parsers/blob/main/benchmark.cpp See the last part, i.e. load using fast CSV Let me know if you see any necessary changes. Thanks, Gopi On Fri, Apr 2, 2021 at 10:55 PM Omar Shrit <o...@shrit.me> wrote: > Great, can not wait to see the results. > > I did not have a chance to look at the API of the fast cpp, there is > no examples or clear doc, it requires reading the code directly. > I will let you know if I have anything. > > Best, > > Omar > > On 04/02, Gopi Manohar Tatiraju wrote: > > Heyy, > > > > So I was not able to work out fast csv, but I edited the existing code to > > read the whole data column-wise, > > each column is returned to us as a std::vector which I then converted to > > arma::vec and then at the end > > insert the column into an arma::mat. > > Suggested code changes: > > > > > > > arma::fmat mat(doc.GetRowCount(), doc.GetColumnCount()); > > > std::vector<float> column; > > > for(int i = 0; i < doc.GetColumnCount(); i++) > > > { > > > column = doc.GetColumn<float>(i); > > > arma::fvec column_vector(column); > > > mat.col(i) = column_vector; > > > } > > > > > > I am running the benchmark code, it's gonna take some time, so I will > > upload the code finishes compiling. > > Also, any idea regarding the other parser would help. > > > > Thanks, > > Gopi > > > > On Fri, Apr 2, 2021 at 12:47 AM Gopi Manohar Tatiraju < > deathcod...@gmail.com> > > wrote: > > > > > Hey, > > > > > > Was working on it. > > > Here's the link: > > > > https://github.com/heisenbuug/Benchmark-CSV-Parsers/blob/main/csvparser_log_check.ipynb > > > > > > Thanks, > > > Gopi > > > > > > On Fri, Apr 2, 2021 at 12:28 AM Omar Shrit <o...@shrit.me> wrote: > > > > > >> Hello Gopi, > > >> > > >> Thank you for starting the benchmark, would it be possible to plot the > > >> log and add the results to the open pull request to get a better > > >> comparison? > > >> > > >> The code seems to be fine, it can be optimized, but I would wait to > see > > >> the plots. > > >> > > >> Thanks, > > >> > > >> Omar > > >> > > >> On 04/01, Gopi Manohar Tatiraju wrote: > > >> > Hey Omar, > > >> > > > >> > Sorry, it took longer. I was running benchmark code since this > morning > > >> and > > >> > it took a lot of time as my system is a bit slow. > > >> > I compared the default armadillo parser, mlpack's custom parser, and > > >> > rapidcsv. > > >> > > > >> > Can you verify the code I used? I might have done something wrong > and it > > >> > took a lot of time to run this code, but that is maybe due to the > fact > > >> that > > >> > my system is not that powerful. > > >> > *Link to the repo and log file:* > > >> > https://github.com/heisenbuug/Benchmark-CSV-Parsers > > >> > > > >> > In the meantime, I will also start working on my draft proposal a > bit, > > >> and > > >> > once we do this testing we can use those results to decide our > > >> > plan of action. Let me know if you have any suggestions or points > for > > >> the > > >> > draft proposal. > > >> > > > >> > Thank you, > > >> > Gopi M. Tatiraju > > >> > > > >> > > > >> > On Thu, Apr 1, 2021 at 5:59 PM Omar Shrit <o...@shrit.me> wrote: > > >> > > > >> > > Hello Gopi, > > >> > > > > >> > > Would it be possible to do some benchmark for these two and > compare > > >> > > them with already existing Boost Spirit. If there is a > considerable > > >> > > difference > > >> > > in performance between these two parsers, then the obvious choice > will > > >> > > be for the faster one. I know that both of them are called (fast, > > >> rapid) > > >> > > but I did not see any benchmark yet to know which one is faster. > > >> > > > > >> > > Let me know what do you think, the benchmark will help us in doing > > >> better > > >> > > choice, since this is the internal (private) API, and will not be > used > > >> > > by the user directly. > > >> > > > > >> > > These are my thoughts, let me know what do you think. > > >> > > > > >> > > Omar. > > >> > > > > >> > > On 04/01, Gopi Manohar Tatiraju wrote: > > >> > > > Hey, > > >> > > > > > >> > > > So, I want through both the libraries we considered for `csv > > >> parsers` > > >> > > > I implemented code to load the data from a small example `csv` > file > > >> > > > to arma::mat, here is the sample code, let me know what you > think. > > >> > > > I am loading into wrong in arma::mat? Can there be any other > > >> efficient > > >> > > > way? > > >> > > > > > >> > > > Fast CSV Parser < > > >> https://github.com/ben-strasser/fast-cpp-csv-parser> > > >> > > > io::CSVReader<4> in("llog.csv"); > > >> > > > float a, b, c, d; > > >> > > > int row = 0; > > >> > > > arma::mat data(20, 4); > > >> > > > > > >> > > > while(in.read_row(a, b, c, d)){ > > >> > > > data(row, 0) = a; > > >> > > > data(row, 1) = b; > > >> > > > data(row, 2) = c; > > >> > > > data(row, 3) = d; > > >> > > > row++; > > >> > > > } > > >> > > > > > >> > > > Rapid.csv <https://github.com/d99kris/rapidcsv> > > >> > > > // For headerless csv files > > >> > > > rapidcsv::Document doc("llog.csv", rapidcsv::LabelParams(-1, > -1)); > > >> > > > arma::mat data(doc.GetRowCount(), doc.GetColumnCount(), > > >> > > arma::fill::ones); > > >> > > > > > >> > > > std::vector<float> col; > > >> > > > for(int i = 0; i < doc.GetRowCount(); i++) > > >> > > > { > > >> > > > col = doc.GetRow<float>(i); > > >> > > > for(int j = 0; j < doc.GetColumnCount(); j++) > > >> > > > { > > >> > > > data(i, j) = col[j]; > > >> > > > } > > >> > > > } > > >> > > > > > >> > > > After using both a I feel like `rapid.csv` is easier to grasp > and > > >> work on > > >> > > > and seemed more structured. > > >> > > > Let me know your thoughts. Also If loading like the above > example is > > >> > > file, > > >> > > > this can be converted > > >> > > > into a function that can act as basic csv file loading in > arma::mat, > > >> > > right? > > >> > > > > > >> > > > Thank You, > > >> > > > Gopi > > >> > > > > > >> > > > On Mon, Mar 29, 2021 at 8:28 PM Omar Shrit <o...@shrit.me> > wrote: > > >> > > > > > >> > > > > Hey Gopi > > >> > > > > > > >> > > > > On 03/29, Gopi Manohar Tatiraju wrote: > > >> > > > > > Hey, > > >> > > > > > > > >> > > > > > I agree, after going a bit through both the candidates I can > > >> see we > > >> > > can > > >> > > > > > unload a lot of work by using a well-implemented existing > > >> parser. > > >> > > > > > I think I should start by comparing both the mentioned > > >> libraries to > > >> > > > > decide > > >> > > > > > which one to use. I will use the same benchmark strategy > that > > >> > > > > > was discussed in the issue. Does that sound good? > > >> > > > > > > >> > > > > Sounds good to me. > > >> > > > > > > >> > > > > > And also I think I can work on replacing boost spirits in > GSoC > > >> then. > > >> > > This > > >> > > > > > will be a start to the data frame idea. Even if we are left > > >> with time > > >> > > > > > after this, I can start the work on the data frame as well. > Is > > >> it > > >> > > > > > considerable? > > >> > > > > > > >> > > > > Yes of course. > > >> > > > > > > >> > > > > > Thanks, > > >> > > > > > Gopi > > >> > > > > > > > >> > > > > > > > >> > > > > > On Mon, Mar 29, 2021 at 7:33 PM Omar Shrit <o...@shrit.me> > > >> wrote: > > >> > > > > > > > >> > > > > > > Hey Gopi, > > >> > > > > > > > > >> > > > > > > I totally agree with Ryan, using existing parser will > > >> accelerate > > >> > > the > > >> > > > > > > project and allow to move forward with the dataframe > class. > > >> Also, I > > >> > > > > > > do believe that replacing boost Spirit with an existing > > >> parser will > > >> > > > > take > > >> > > > > > > a considerable amount of the summer. > > >> > > > > > > > > >> > > > > > > Thanks, > > >> > > > > > > > > >> > > > > > > Omar > > >> > > > > > > > > >> > > > > > > On 03/29, Ryan Curtin wrote: > > >> > > > > > > > On Mon, Mar 29, 2021 at 04:17:35PM +0530, Gopi Manohar > > >> Tatiraju > > >> > > > > wrote: > > >> > > > > > > > > Would love to hear your thoughts on whether to go > with an > > >> > > already > > >> > > > > > > > > implemented parser or build a new one. Also if we are > > >> planning > > >> > > to > > >> > > > > > > build a > > >> > > > > > > > > data frame here then > > >> > > > > > > > > maybe going with an in-house parser would be better > as we > > >> will > > >> > > > > have the > > >> > > > > > > > > ability to design it in such a way that it can extend > > >> maximum > > >> > > > > support > > >> > > > > > > to > > >> > > > > > > > > the new data frame > > >> > > > > > > > > which we are planning to build ahead. > > >> > > > > > > > > > >> > > > > > > > Hey Gopi, > > >> > > > > > > > > > >> > > > > > > > Honestly I think it's best to use another package. Not > > >> only will > > >> > > > > this > > >> > > > > > > > free up time to actually work on the dataframe class, > but > > >> also it > > >> > > > > means > > >> > > > > > > > we are not responsible for maintenance of the CSV > parser. > > >> There > > >> > > are > > >> > > > > > > > lots of little complexities and edge cases in parsing > (not > > >> to > > >> > > mention > > >> > > > > > > > efficiency!) and so we can probably get a lot more bang > for > > >> our > > >> > > buck > > >> > > > > > > > here by using an implementation from someone who has > > >> already put > > >> > > down > > >> > > > > > > > the time to consider all those details. > > >> > > > > > > > > > >> > > > > > > > Hope this is helpful. :) > > >> > > > > > > > > > >> > > > > > > > Thanks, > > >> > > > > > > > > > >> > > > > > > > Ryan > > >> > > > > > > > > > >> > > > > > > > -- > > >> > > > > > > > Ryan Curtin | "Kill them, Machine... kill them all." > > >> > > > > > > > r...@ratml.org | - Dino Velvet > > >> > > > > > > > > >> > > > > > > >> > > > > >> > > > >
_______________________________________________ mlpack mailing list mlpack@lists.mlpack.org http://knife.lugatgt.org/cgi-bin/mailman/listinfo/mlpack