Looks like something I would do in datomic. Parse the log one time, and
concurrently put the data into a datomic database and then ... wait I just
realized, I don't think we can get the results sorted from the database,
because the return from a query is a set of results and your transaction
ids are certainly not consecutive - I must investigate....



On Sun, Oct 21, 2012 at 3:54 AM, Dave <d...@sevenventures.net> wrote:

> Clojurists - I'm fairly new to Clojure and didn't realize how broken I've
> become using imperative languages all my life.  I'm stumped as to how to
> parse a Varnish (www.varnish-cache.org) log file using Clojure.  The main
> problem is that for a single request a varnish log file generates multiple
> log lines and each line is interspersed with lines from other threads.
>  These log files can be several gigabytes in size (so using a stable sort
> of the entire log by thread id is out of the question).
>
> Below I've included a small example log file and an example output Clojure
> data structure.  Let me thank everyone in advance for any hints / help they
> can provide on this seemingly simple problem.
>
> *Rules of the Varnish Log File*
>
>    - The first number on each line is the thread id (not unique and gets
>    reused frequently)
>    - Each ReqStart marks the start of a request and the last number on
>    the line is the unique transaction id (e.g. 118591777)
>    - ReqEnd denote the end of the processing of the request by the thread
>    - Each line is atomically written, however many threads generate log
>    lines that are interspersed with other requests (threads)
>    - These log files can be VERY large (10+ Gigabytes in the case of my
>    application) so using a stable sort by thread id or anything that loads the
>    entire file into memory is out of the question.
>
>
> *Example Varnish Log file*
>    40 ReqEnd       c 118591771 1350759605.775758028 1350759611.249602079
> 5.866879225 5.473801851 0.000042200
>    15 ReqStart     c 10.102.41.121 4187 118591777
>    15 RxRequest    c GET
>    15 RxURL        c /json/engagement
>    15 RxHeader     c host: www.example.com
>    30 ReqStart     c 10.102.41.121 3906 118591802
>    15 RxHeader     c Accept: application/json
>    30 RxRequest    c GET
>    30 RxURL        c /ws/boxtops/user/
>    30 RxHeader     c host: www.example.com
>    15 ReqEnd       c 118591777 1350759605.775758028 1350759611.249602079
> 5.866879225 5.473801851 0.000042200
>    30 RxHeader     c Accept: application/xml
>    30 ReqEnd       c 118591802 1350759611.326084614 1350759611.329720259
> 0.005002737 0.003598213 0.000037432
>    15 ReqStart     c 10.102.41.121 4187 118591808
>    15 RxRequest    c GET
>    15 RxURL        c /ws/boxtops/user/
>    30 ReqStart     c 10.102.41.121 3906 118591810
>    15 RxHeader     c host: www.example.com
>    15 RxHeader     c Accept: application/xml
>    30 RxRequest    c GET
>    30 RxURL        c /registration/success
>    30 RxHeader     c host: www.example.com
>    46 TxRequest    - GET
>    30 RxHeader     c Accept: text/html
>    46 TxURL        - /registration/success
>    15 ReqEnd       c 118591808 1350759611.442447424 1350759611.444925785
> 0.016906023 0.002441406 0.000036955
>    30 ReqEnd       c 118591810 1350759611.521781683 1350759611.525400877
> 0.098322868 0.003532171 0.000087023
>
> *Desired Output*
> {
>   118591802
>   { :ReqStart ["10.102.41.121 3906 118591802"]
>     :RxRequest ["GET"]
>     :RxURL ["/ws/boxtops/user/"]
>     :RxHeader ["host: www.example.com" "Accept: application/xml"]
>               or better yet
>     :RxHeader {:host "www.example.com" :Accept "application/xml"}
>     :ReqEnd ["118591802 1350759611.326084614 1350759611.329720259
> 0.005002737 0.003598213 0.000037432"] }
>   118591777
>   { :ReqStart ["10.102.41.121 4187 118591777"]
>     :RxRequest ["GET"]
>     :RxURL ["/json/engagement"]
>     :RxHeader ["host: www.example.com" "Accept: application/json"]
>     :ReqEnd ["118591777 1350759605.775758028 1350759611.249602079
> 5.866879225 5.473801851 0.000042200" ]}
>   118591808
>   { :ReqStart [10.102.41.121 4187 118591808]
>     :RxRequest ["GET"]
>     :RxURL ["/ws/boxtops/user/"]
>     :RxHeader ["host: www.example.com" "Accept: application/xml"]
>     :ReqEnd ["118591808 1350759611.442447424 1350759611.444925785
> 0.016906023 0.002441406 0.000036955"] }
>   118591810
>   { :ReqStart ["10.102.41.121 3906 118591810"]
>     :RxRequest ["GET"]
>     :RxURL ["/registration/success"]
>     :RxHeader ["host: www.example.com" "Accept: text/html]
>     :ReqEnd ["118591810 1350759611.521781683 1350759611.525400877
> 0.098322868 0.003532171 0.000087023"] }
> }
>
>  --
> You received this message because you are subscribed to the Google
> Groups "Clojure" group.
> To post to this group, send email to clojure@googlegroups.com
> Note that posts from new members are moderated - please be patient with
> your first post.
> To unsubscribe from this group, send email to
> clojure+unsubscr...@googlegroups.com
> For more options, visit this group at
> http://groups.google.com/group/clojure?hl=en




-- 
I may be wrong or incomplete.
Please express any corrections / additions,
they are encouraged and appreciated.
At least one entity is bound to be transformed if you do ;)

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en

Reply via email to