Hello Gopi,

Thank you for starting the benchmark, would it be possible to plot the
log and add the results to the open pull request to get a better
comparison?

The code seems to be fine, it can be optimized, but I would wait to see
the plots.

Thanks,

Omar

On 04/01, Gopi Manohar Tatiraju wrote:
> Hey Omar,
> 
> Sorry, it took longer. I was running benchmark code since this morning and
> it took a lot of time as my system is a bit slow.
> I compared the default armadillo parser, mlpack's custom parser, and
> rapidcsv.
> 
> Can you verify the code I used? I might have done something wrong and it
> took a lot of time to run this code, but that is maybe due to the fact that
> my system is not that powerful.
> *Link to the repo and log file:*
> https://github.com/heisenbuug/Benchmark-CSV-Parsers
> 
> In the meantime, I will also start working on my draft proposal a bit, and
> once we do this testing we can use those results to decide our
> plan of action. Let me know if you have any suggestions or points for the
> draft proposal.
> 
> Thank you,
> Gopi M. Tatiraju
> 
> 
> On Thu, Apr 1, 2021 at 5:59 PM Omar Shrit <o...@shrit.me> wrote:
> 
> > Hello Gopi,
> >
> > Would it be possible to do some benchmark for these two and compare
> > them with already existing Boost Spirit. If there is a considerable
> > difference
> > in performance between these two parsers, then the obvious choice will
> > be for the faster one. I know that both of them are called (fast, rapid)
> > but I did not see any benchmark yet to know which one is faster.
> >
> > Let me know what do you think, the benchmark will help us in doing better
> > choice, since this is the internal (private) API, and will not be used
> > by the user directly.
> >
> > These are my thoughts, let me know what do you think.
> >
> > Omar.
> >
> > On 04/01, Gopi Manohar Tatiraju wrote:
> > > Hey,
> > >
> > > So, I want through both the libraries we considered for `csv parsers`
> > > I implemented code to load the data from a small example `csv` file
> > > to arma::mat, here is the sample code, let me know what you think.
> > > I am loading into wrong in arma::mat? Can there be any other efficient
> > > way?
> > >
> > > Fast CSV Parser <https://github.com/ben-strasser/fast-cpp-csv-parser>
> > > io::CSVReader<4> in("llog.csv");
> > > float a, b, c, d;
> > > int row = 0;
> > > arma::mat data(20, 4);
> > >
> > > while(in.read_row(a, b, c, d)){
> > > data(row, 0) = a;
> > > data(row, 1) = b;
> > > data(row, 2) = c;
> > > data(row, 3) = d;
> > > row++;
> > > }
> > >
> > > Rapid.csv <https://github.com/d99kris/rapidcsv>
> > > // For headerless csv files
> > > rapidcsv::Document doc("llog.csv", rapidcsv::LabelParams(-1, -1));
> > > arma::mat data(doc.GetRowCount(), doc.GetColumnCount(),
> > arma::fill::ones);
> > >
> > > std::vector<float> col;
> > > for(int i = 0; i < doc.GetRowCount(); i++)
> > > {
> > > col = doc.GetRow<float>(i);
> > > for(int j = 0; j < doc.GetColumnCount(); j++)
> > > {
> > > data(i, j) = col[j];
> > > }
> > > }
> > >
> > > After using both a I feel like `rapid.csv` is easier to grasp and work on
> > > and seemed more structured.
> > > Let me know your thoughts. Also If loading like the above example is
> > file,
> > > this can be converted
> > > into a function that can act as basic csv file loading in arma::mat,
> > right?
> > >
> > > Thank You,
> > > Gopi
> > >
> > > On Mon, Mar 29, 2021 at 8:28 PM Omar Shrit <o...@shrit.me> wrote:
> > >
> > > > Hey Gopi
> > > >
> > > > On 03/29, Gopi Manohar Tatiraju wrote:
> > > > > Hey,
> > > > >
> > > > > I agree, after going a bit through both the candidates I can see we
> > can
> > > > > unload a lot of work by using a well-implemented existing parser.
> > > > > I think I should start by comparing both the mentioned libraries to
> > > > decide
> > > > > which one to use. I will use the same benchmark strategy that
> > > > > was discussed in the issue. Does that sound good?
> > > >
> > > > Sounds good to me.
> > > >
> > > > > And also I think I can work on replacing boost spirits in GSoC then.
> > This
> > > > > will be a start to the data frame idea. Even if we are left with time
> > > > > after this, I can start the work on the data frame as well. Is it
> > > > > considerable?
> > > >
> > > > Yes of course.
> > > >
> > > > > Thanks,
> > > > > Gopi
> > > > >
> > > > >
> > > > > On Mon, Mar 29, 2021 at 7:33 PM Omar Shrit <o...@shrit.me> wrote:
> > > > >
> > > > > > Hey Gopi,
> > > > > >
> > > > > > I totally agree with Ryan, using existing parser will accelerate
> > the
> > > > > > project and allow to move forward with the dataframe class. Also, I
> > > > > > do believe that replacing boost Spirit with an existing parser will
> > > > take
> > > > > > a considerable amount of the summer.
> > > > > >
> > > > > > Thanks,
> > > > > >
> > > > > > Omar
> > > > > >
> > > > > > On 03/29, Ryan Curtin wrote:
> > > > > > > On Mon, Mar 29, 2021 at 04:17:35PM +0530, Gopi Manohar Tatiraju
> > > > wrote:
> > > > > > > > Would love to hear your thoughts on whether to go with an
> > already
> > > > > > > > implemented parser or build a new one. Also if we are planning
> > to
> > > > > > build a
> > > > > > > > data frame here then
> > > > > > > > maybe going with an in-house parser would be better as we will
> > > > have the
> > > > > > > > ability to design it in such a way that it can extend maximum
> > > > support
> > > > > > to
> > > > > > > > the new data frame
> > > > > > > > which we are planning to build ahead.
> > > > > > >
> > > > > > > Hey Gopi,
> > > > > > >
> > > > > > > Honestly I think it's best to use another package.  Not only will
> > > > this
> > > > > > > free up time to actually work on the dataframe class, but also it
> > > > means
> > > > > > > we are not responsible for maintenance of the CSV parser.  There
> > are
> > > > > > > lots of little complexities and edge cases in parsing (not to
> > mention
> > > > > > > efficiency!) and so we can probably get a lot more bang for our
> > buck
> > > > > > > here by using an implementation from someone who has already put
> > down
> > > > > > > the time to consider all those details.
> > > > > > >
> > > > > > > Hope this is helpful. :)
> > > > > > >
> > > > > > > Thanks,
> > > > > > >
> > > > > > > Ryan
> > > > > > >
> > > > > > > --
> > > > > > > Ryan Curtin    | "Kill them, Machine... kill them all."
> > > > > > > r...@ratml.org |   - Dino Velvet
> > > > > >
> > > >
> >

Attachment: signature.asc
Description: PGP signature

_______________________________________________
mlpack mailing list
mlpack@lists.mlpack.org
http://knife.lugatgt.org/cgi-bin/mailman/listinfo/mlpack

Reply via email to