I wanted to report back my success with my "hybrid" solution for
managing millions of "record" type data, where each record is small
(300ish bytes). As previous emails shown I was having horrible luck
using either the "huge file with millions of fragments" or "millions of
tiny files". I'm sure if I was running a 64 bit machine with 12+G
Memory those problems would vanish.
But well, I'm experimenting on what I have at-hand, which isnt that.
My latest experiment is to "chuck" my 500MB file into about 3000 files
each with 1000 "rows" of data
(fyi, using the "xsplit" command from xmlsh).
I have some other large files with fewer "rows" that are 10x bigger so I
chunked those to 100 "rows" each.
So the end result is a directory with about a 1000ish files each with
100-1000 "rows".
I've set the fragmentation rules so each "row" gets a separate fragment.
Now I have no directories with more then a few 1000 files, and no file
with more then 1000 fragments.
End result. Awsome. Peforms extremely well using both XPath and search
expressions.
I can use the xdmp:directory("/dir/") or simply //row or a
combination, and they all are performing "instantly".
Where I was previously getting 5sec+ delays I'm now getting typically
.1 sec and once cached .01 sec or less.
Even in cases where I have to do queries across a dozen different data
sets (separate calls to cts:search) on the same page I'm getting < 1sec
response.
Awsome.
I think this is about as good as it gets.
I haven't tried (and probably wont for now, I'm tired of rebuilding my
databases ... :( )
the idea of a directory heirarchy. That may work as well for me , I
dont know.
e.g.
/1/100
/1/101
....
/2/200
If I *need* seperate files I will try that approach, but for now its
more manageable both on the server, and on my local workstation where
I'm creating the content to deal with thousands, instead of millions, of
files.
Thank you all for your patience and help, I appreciate every bit of it !
_David
----------------------------------------
David A. Lee
Senior Principal Software Engineer
Epocrates, Inc.
[email protected] <mailto:[email protected]>
812-482-5224
_______________________________________________
General mailing list
[email protected]
http://xqzone.com/mailman/listinfo/general