I'm considering using MarkLogic as a log file analyzer.
This means storing possibly 100's of GB of fairly flat structure ( think log4j, 
apache, and tomcat output ).
Why use ML at all when maybe something like mysql would be better ?
I think the dateTime indexing and freefiorm text searching would be extremely 
valuable.
Also, although the raw data is "flat" resultant analized data may well be 
hierarchal (imagine creating a call stack diagram from log files).   I think 
this is a perfect use of ML and XQuery.  (but I may be insane).

With that in mind I'm curious how to make this efficient in time & space.
If I make a new forest & database just for this ...what minimizes the time to 
load and key space ?
My *guess* is to minimize all the parameters in the database affecting indexing 
to the bare minimum, possibly none except for explicit indexes ... or maybe a 
simple word index.    But would love opinions.

Data rate I'm looking at is approx. 10GB/day  - continuously ... and likely may 
need to archive off anything over a few days old (so 50GB might be a reasonable 
max storage).

I'd like the data to be fed in realtime and not require a rack of 100 servers 
to do it ...

Ideas welcome ! (including "your insane just use another tool").




----------------------------------------
David A. Lee
Senior Principal Software Engineer
Epocrates, Inc.
[email protected]<mailto:[email protected]>
812-482-5224

_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general

Reply via email to