If I remove all => actions, the time goes down to 120 seconds.
 
 
23.11.2020, 00:18, "John Benediktsson" <mrj...@gmail.com>:
I suspect it’s a lot of “swap prefix >string” type stuff that’s different, but I can help you profile this a little later today or tomorrow. 
 
 
On Nov 22, 2020, at 1:07 PM, Alexander Ilin <ajs...@yandex.ru> wrote:
 

Hello!
 
 I put the code into a vocab, restarted the Listener and repeated the test like so:
 
IN: log-db
 
EBNF: parse-csv-line [=[
   quotedColumn = "\""~ (!("\"") .)* "\""~ quotedColumn*
       => [[ first2 swap prefix [ >string ] map "\"" join ]]
   unquotedColumn = (!("\t") .)*
   column = ( quotedColumn | unquotedColumn ) => [[ >string ]]
   rule = column ( "\t"~ column )* => [[ first2 swap prefix ]]
]=]

: parse ( file -- ast )
   [ utf8 [
       input-stream get [ parse-csv-line , ] each-stream-line
   ] with-file-reader ] { } make ;
 
 
 
In Listener:
 
USE: log-db now "file-name.csv" parse now rot time-
 
The resulting run time is 180 seconds, which is fewer than 200, but not that much closer to 2.
Somehow the optimizations don't seem to be helping a lot here.
 
22.11.2020, 22:51, "John Benediktsson" <mrj...@gmail.com>:

When you run that in the listener it uses the non optimizing compiler.

You should use the EBNF: word [=[ ... ]=] form and then refer to word for it to be a compiled parser.

It’ll be much faster.

Or wrap all that in a : foo ( — ) ... ;


 

 On Nov 22, 2020, at 11:49 AM, Alexander Ilin <ajs...@yandex.ru> wrote:
 
 Hello!
 
  I've got my first test results, and I'm having some doubts.
 
  The following code runs almost 200 seconds on a 20Mb file:
 
 "file-name.csv" [ utf8 [ input-stream get
    [
        EBNF[=[
          quotedColumn = "\""~ (!("\"") .)* "\""~ quotedColumn*
            => [[ first2 swap prefix [ >string ] map "\"" join ]]
          unquotedColumn = (!("\t") .)*
          column = ( quotedColumn | unquotedColumn ) => [[ >string ]]
          rule = column ( "\t"~ column )* => [[ first2 swap prefix ]]
        ]=] ,
    ] each-stream-line
 ] with-file-reader ] { } make
 
 
  The following equivalent code using the csv vocab runs about 2 seconds on the same file:
 
 "file-name.csv" [ utf8 [ input-stream get CHAR: \t [
    [ string>csv [ first , ] unless-empty ] each-stream-line
 ] with-delimiter ] with-file-reader ] { } make
 
 
  The difference is 100x, and the question is: is the speed difference related to the fact that I'm running the code in the Listener? Could it be that if I put it all into a vocab as opposed to running interactively it would get better optimized and reach the performance of the csv vocab?
 
 ---=====---
 Александр
 
 
 
 _______________________________________________
 Factor-talk mailing list
 Factor-talk@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/factor-talk



_______________________________________________
Factor-talk mailing list
Factor-talk@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/factor-talk

 
 
---=====---
Александр
 
_______________________________________________
Factor-talk mailing list
Factor-talk@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/factor-talk
,,

_______________________________________________
Factor-talk mailing list
Factor-talk@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/factor-talk

 
 
---=====---
Александр
 
_______________________________________________
Factor-talk mailing list
Factor-talk@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/factor-talk

Reply via email to