We can continue this discussion if you'd like on GitHub: https://github.com/factor/factor/issues/2377
On Sun, Nov 22, 2020 at 9:34 PM Alexander Ilin <ajs...@yandex.ru> wrote: > If I remove all => actions, the time goes down to 120 seconds. > > > 23.11.2020, 00:18, "John Benediktsson" <mrj...@gmail.com>: > > I suspect it’s a lot of “swap prefix >string” type stuff that’s different, > but I can help you profile this a little later today or tomorrow. > > > > On Nov 22, 2020, at 1:07 PM, Alexander Ilin <ajs...@yandex.ru> wrote: > > > > Hello! > > I put the code into a vocab, restarted the Listener and repeated the test > like so: > > IN: log-db > > EBNF: parse-csv-line [=[ > quotedColumn = "\""~ (!("\"") .)* "\""~ quotedColumn* > => [[ first2 swap prefix [ >string ] map "\"" join ]] > unquotedColumn = (!("\t") .)* > column = ( quotedColumn | unquotedColumn ) => [[ >string ]] > rule = column ( "\t"~ column )* => [[ first2 swap prefix ]] > ]=] > > : parse ( file -- ast ) > [ utf8 [ > input-stream get [ parse-csv-line , ] each-stream-line > ] with-file-reader ] { } make ; > > > > In Listener: > > USE: log-db now "file-name.csv" parse now rot time- > > The resulting run time is 180 seconds, which is fewer than 200, but not > that much closer to 2. > Somehow the optimizations don't seem to be helping a lot here. > > 22.11.2020, 22:51, "John Benediktsson" <mrj...@gmail.com>: > > When you run that in the listener it uses the non optimizing compiler. > > You should use the EBNF: word [=[ ... ]=] form and then refer to word for > it to be a compiled parser. > > It’ll be much faster. > > Or wrap all that in a : foo ( — ) ... ; > > > > > On Nov 22, 2020, at 11:49 AM, Alexander Ilin <ajs...@yandex.ru> wrote: > > Hello! > > I've got my first test results, and I'm having some doubts. > > The following code runs almost 200 seconds on a 20Mb file: > > "file-name.csv" [ utf8 [ input-stream get > [ > EBNF[=[ > quotedColumn = "\""~ (!("\"") .)* "\""~ quotedColumn* > => [[ first2 swap prefix [ >string ] map "\"" join ]] > unquotedColumn = (!("\t") .)* > column = ( quotedColumn | unquotedColumn ) => [[ >string ]] > rule = column ( "\t"~ column )* => [[ first2 swap prefix ]] > ]=] , > ] each-stream-line > ] with-file-reader ] { } make > > > The following equivalent code using the csv vocab runs about 2 seconds > on the same file: > > "file-name.csv" [ utf8 [ input-stream get CHAR: \t [ > [ string>csv [ first , ] unless-empty ] each-stream-line > ] with-delimiter ] with-file-reader ] { } make > > > The difference is 100x, and the question is: is the speed difference > related to the fact that I'm running the code in the Listener? Could it be > that if I put it all into a vocab as opposed to running interactively it > would get better optimized and reach the performance of the csv vocab? > > ---=====--- > Александр > > > > _______________________________________________ > Factor-talk mailing list > Factor-talk@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/factor-talk > > > > _______________________________________________ > Factor-talk mailing list > Factor-talk@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/factor-talk > > > > ---=====--- > Александр > > _______________________________________________ > Factor-talk mailing list > Factor-talk@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/factor-talk > > ,, > > _______________________________________________ > Factor-talk mailing list > Factor-talk@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/factor-talk > > > > ---=====--- > Александр > > _______________________________________________ > Factor-talk mailing list > Factor-talk@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/factor-talk >
_______________________________________________ Factor-talk mailing list Factor-talk@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/factor-talk