I'm considering using MarkLogic as a log file analyzer. This means storing possibly 100's of GB of fairly flat structure ( think log4j, apache, and tomcat output ). Why use ML at all when maybe something like mysql would be better ? I think the dateTime indexing and freefiorm text searching would be extremely valuable. Also, although the raw data is "flat" resultant analized data may well be hierarchal (imagine creating a call stack diagram from log files). I think this is a perfect use of ML and XQuery. (but I may be insane).
With that in mind I'm curious how to make this efficient in time & space. If I make a new forest & database just for this ...what minimizes the time to load and key space ? My *guess* is to minimize all the parameters in the database affecting indexing to the bare minimum, possibly none except for explicit indexes ... or maybe a simple word index. But would love opinions. Data rate I'm looking at is approx. 10GB/day - continuously ... and likely may need to archive off anything over a few days old (so 50GB might be a reasonable max storage). I'd like the data to be fed in realtime and not require a rack of 100 servers to do it ... Ideas welcome ! (including "your insane just use another tool"). ---------------------------------------- David A. Lee Senior Principal Software Engineer Epocrates, Inc. [email protected]<mailto:[email protected]> 812-482-5224
_______________________________________________ General mailing list [email protected] http://developer.marklogic.com/mailman/listinfo/general
