We can continue this discussion if you'd like on GitHub:

https://github.com/factor/factor/issues/2377



On Sun, Nov 22, 2020 at 9:34 PM Alexander Ilin <ajs...@yandex.ru> wrote:

> If I remove all => actions, the time goes down to 120 seconds.
>
>
> 23.11.2020, 00:18, "John Benediktsson" <mrj...@gmail.com>:
>
> I suspect it’s a lot of “swap prefix >string” type stuff that’s different,
> but I can help you profile this a little later today or tomorrow.
>
>
>
> On Nov 22, 2020, at 1:07 PM, Alexander Ilin <ajs...@yandex.ru> wrote:
>
>
> 
> Hello!
>
>  I put the code into a vocab, restarted the Listener and repeated the test
> like so:
>
> IN: log-db
>
> EBNF: parse-csv-line [=[
>    quotedColumn = "\""~ (!("\"") .)* "\""~ quotedColumn*
>        => [[ first2 swap prefix [ >string ] map "\"" join ]]
>    unquotedColumn = (!("\t") .)*
>    column = ( quotedColumn | unquotedColumn ) => [[ >string ]]
>    rule = column ( "\t"~ column )* => [[ first2 swap prefix ]]
> ]=]
>
> : parse ( file -- ast )
>    [ utf8 [
>        input-stream get [ parse-csv-line , ] each-stream-line
>    ] with-file-reader ] { } make ;
>
>
>
> In Listener:
>
> USE: log-db now "file-name.csv" parse now rot time-
>
> The resulting run time is 180 seconds, which is fewer than 200, but not
> that much closer to 2.
> Somehow the optimizations don't seem to be helping a lot here.
>
> 22.11.2020, 22:51, "John Benediktsson" <mrj...@gmail.com>:
>
> When you run that in the listener it uses the non optimizing compiler.
>
> You should use the EBNF: word [=[ ... ]=] form and then refer to word for
> it to be a compiled parser.
>
> It’ll be much faster.
>
> Or wrap all that in a : foo ( — ) ... ;
>
>
>
>
>  On Nov 22, 2020, at 11:49 AM, Alexander Ilin <ajs...@yandex.ru> wrote:
>
>  Hello!
>
>   I've got my first test results, and I'm having some doubts.
>
>   The following code runs almost 200 seconds on a 20Mb file:
>
>  "file-name.csv" [ utf8 [ input-stream get
>     [
>         EBNF[=[
>           quotedColumn = "\""~ (!("\"") .)* "\""~ quotedColumn*
>             => [[ first2 swap prefix [ >string ] map "\"" join ]]
>           unquotedColumn = (!("\t") .)*
>           column = ( quotedColumn | unquotedColumn ) => [[ >string ]]
>           rule = column ( "\t"~ column )* => [[ first2 swap prefix ]]
>         ]=] ,
>     ] each-stream-line
>  ] with-file-reader ] { } make
>
>
>   The following equivalent code using the csv vocab runs about 2 seconds
> on the same file:
>
>  "file-name.csv" [ utf8 [ input-stream get CHAR: \t [
>     [ string>csv [ first , ] unless-empty ] each-stream-line
>  ] with-delimiter ] with-file-reader ] { } make
>
>
>   The difference is 100x, and the question is: is the speed difference
> related to the fact that I'm running the code in the Listener? Could it be
> that if I put it all into a vocab as opposed to running interactively it
> would get better optimized and reach the performance of the csv vocab?
>
>  ---=====---
>  Александр
>
>
>
>  _______________________________________________
>  Factor-talk mailing list
>  Factor-talk@lists.sourceforge.net
>  https://lists.sourceforge.net/lists/listinfo/factor-talk
>
>
>
> _______________________________________________
> Factor-talk mailing list
> Factor-talk@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/factor-talk
>
>
>
> ---=====---
> Александр
>
> _______________________________________________
> Factor-talk mailing list
> Factor-talk@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/factor-talk
>
> ,,
>
> _______________________________________________
> Factor-talk mailing list
> Factor-talk@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/factor-talk
>
>
>
> ---=====---
> Александр
>
> _______________________________________________
> Factor-talk mailing list
> Factor-talk@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/factor-talk
>
_______________________________________________
Factor-talk mailing list
Factor-talk@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/factor-talk

Reply via email to