Clojurists - I'm fairly new to Clojure and didn't realize how broken I've 
become using imperative languages all my life.  I'm stumped as to how to 
parse a Varnish (www.varnish-cache.org) log file using Clojure.  The main 
problem is that for a single request a varnish log file generates multiple 
log lines and each line is interspersed with lines from other threads. 
 These log files can be several gigabytes in size (so using a stable sort 
of the entire log by thread id is out of the question).  

Below I've included a small example log file and an example output Clojure 
data structure.  Let me thank everyone in advance for any hints / help they 
can provide on this seemingly simple problem.

*Rules of the Varnish Log File*

   - The first number on each line is the thread id (not unique and gets 
   reused frequently)
   - Each ReqStart marks the start of a request and the last number on the 
   line is the unique transaction id (e.g. 118591777)
   - ReqEnd denote the end of the processing of the request by the thread
   - Each line is atomically written, however many threads generate log 
   lines that are interspersed with other requests (threads)
   - These log files can be VERY large (10+ Gigabytes in the case of my 
   application) so using a stable sort by thread id or anything that loads the 
   entire file into memory is out of the question.


*Example Varnish Log file*
   40 ReqEnd       c 118591771 1350759605.775758028 1350759611.249602079 
5.866879225 5.473801851 0.000042200
   15 ReqStart     c 10.102.41.121 4187 118591777
   15 RxRequest    c GET
   15 RxURL        c /json/engagement
   15 RxHeader     c host: www.example.com
   30 ReqStart     c 10.102.41.121 3906 118591802
   15 RxHeader     c Accept: application/json
   30 RxRequest    c GET
   30 RxURL        c /ws/boxtops/user/
   30 RxHeader     c host: www.example.com
   15 ReqEnd       c 118591777 1350759605.775758028 1350759611.249602079 
5.866879225 5.473801851 0.000042200
   30 RxHeader     c Accept: application/xml
   30 ReqEnd       c 118591802 1350759611.326084614 1350759611.329720259 
0.005002737 0.003598213 0.000037432
   15 ReqStart     c 10.102.41.121 4187 118591808
   15 RxRequest    c GET
   15 RxURL        c /ws/boxtops/user/
   30 ReqStart     c 10.102.41.121 3906 118591810
   15 RxHeader     c host: www.example.com
   15 RxHeader     c Accept: application/xml
   30 RxRequest    c GET
   30 RxURL        c /registration/success
   30 RxHeader     c host: www.example.com
   46 TxRequest    - GET
   30 RxHeader     c Accept: text/html
   46 TxURL        - /registration/success
   15 ReqEnd       c 118591808 1350759611.442447424 1350759611.444925785 
0.016906023 0.002441406 0.000036955
   30 ReqEnd       c 118591810 1350759611.521781683 1350759611.525400877 
0.098322868 0.003532171 0.000087023

*Desired Output*
{ 
  118591802 
  { :ReqStart ["10.102.41.121 3906 118591802"] 
    :RxRequest ["GET"]
    :RxURL ["/ws/boxtops/user/"]
    :RxHeader ["host: www.example.com" "Accept: application/xml"] 
              or better yet
    :RxHeader {:host "www.example.com" :Accept "application/xml"}
    :ReqEnd ["118591802 1350759611.326084614 1350759611.329720259 
0.005002737 0.003598213 0.000037432"] }
  118591777
  { :ReqStart ["10.102.41.121 4187 118591777"]
    :RxRequest ["GET"]
    :RxURL ["/json/engagement"]
    :RxHeader ["host: www.example.com" "Accept: application/json"] 
    :ReqEnd ["118591777 1350759605.775758028 1350759611.249602079 
5.866879225 5.473801851 0.000042200" ]}
  118591808
  { :ReqStart [10.102.41.121 4187 118591808]
    :RxRequest ["GET"]
    :RxURL ["/ws/boxtops/user/"]
    :RxHeader ["host: www.example.com" "Accept: application/xml"] 
    :ReqEnd ["118591808 1350759611.442447424 1350759611.444925785 
0.016906023 0.002441406 0.000036955"] }
  118591810
  { :ReqStart ["10.102.41.121 3906 118591810"]
    :RxRequest ["GET"]
    :RxURL ["/registration/success"]
    :RxHeader ["host: www.example.com" "Accept: text/html] 
    :ReqEnd ["118591810 1350759611.521781683 1350759611.525400877 
0.098322868 0.003532171 0.000087023"] }
}

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en

Reply via email to