[MarkLogic Dev General] Precision in Range Indexes

Lee, David Sun, 10 Jul 2011 18:10:48 -0700

I'm experimenting with Range Indexes over dateTime and dayTimeDuration values.
I have a large number (millions) of small documents/fragments each with a 
dateTime and/or dayTimeDuration.


Currently these are in millisecond accuracy.  For most things I dont need the 
ms accuracy but its useful on occasion.     I am wondering is there a 
detrimental effect of this precision ?

One example is log file entries.   There may be hundreds that occur within the 
same second.
If I make a range index over the dateTime field, each of these will get a 
unique value, and if I query the index say using cts:element-attribute-values() 
pretty much every fragment will have a unique value (so the number of unique 
entries in the index is high).

However if I truncate the dateTime to seconds then there will be vastly fewer 
unique values .

I am curious what the effect, if any, would be of doing this.   Does the size 
or search time of the range indexes depend on the number of unique values ? or 
more on the number of fragments ?
I am thinking it would have to depend on both as it needs to map   value -> 
(set of fragments).
So what is the difference if the common case is nearly 1:1 value:fragment vs   
1:many value: fragment ?

I suspect a similar issue arises with double and geo values as well.
My guess/hope is it doesnt make much difference ... but am curious if there 
might be an easy dramatic savings in time or space by truncating precision.

-David



----------------------------------------
David A. Lee
Senior Principal Software Engineer
Epocrates, Inc.
[email protected]<mailto:[email protected]>
812-482-5224

_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general

[MarkLogic Dev General] Precision in Range Indexes

Reply via email to