John et al, thanks for playing --

> First:  lets assume multiples of word size only.

Not really nice, although presently built into JudyNL of course.

> This is WLOG due to the Ocaml trick...

WLOG?  Ocaml?

> ...store the length of the string modulo word size at the end of the
> string.

OK, that means on a 4-byte machine, a value of 0-3 at the end of the
string?  How do you distinguish this from any random bits within the
key?

> This preserves lexicographical ordering.

Right.

> > For example a single byte 100 is encoded as
>
>       (100,0,0,1)

OK, somehow the 1 at the end means "only the first byte matters, and my
key/string ends there."

What about a superstring containing (100,0,0,1) at this point and then
continuing?  Maybe that's why you said assume word-chunks only?

> Four bytes:
>
>       (100,101,102,103),(0,0,0,0)
>
> [32 bit machine]

Lost me there, is the last 0 the marker for end of string?

> Or something like this .. :)

Or something.  :-)

> There are two issues, one is how to track the length when recursing
> down the tree.  That's easy in principle, just maintain a depth
> counter.  It's probably hard in practice.

No, as you go down the tree for any reason, you know how deep you are,
if you care, but that's not really the issue.  The crux problem is how
to EFFICIENTLY distinguish any terminating substrings (padded say with
nulls) from continuing superstrings that happen to contain the same
bytes.

> The other problem is even easier.  If there is a string ending in this
> node, then set the low order but of the JudyL value.  Mask it back to
> zero when looking for the next array.  Use NULL if there are no keys
> with that proper prefix (with the low bit set).

Careful with that, look at the header files, there are already bit
patterns stored within Judy node pointers that libJudy itself uses.
This includes a root pointer to a whole JudyL array.

> Hmm ..  this could be done now using JudySL.  Instead of storing an
> integer value, store a pointer to a structure containing a flag and an
> integer value.

Sure, but now you doubled the size of the tree, more or less, or at
least the depth, with intermediate nodes between every JudyL array.
(Well maybe not 2X in full reality since each JudyL array has 1-4?
internal nodes => cache line misses.)

> If the flag is set, the null byte really was the end of the string,
> use the integer value.  If the flag is not set, its not the end of the
> string, cast the integer value to a pointer which is the next JudySL
> array.  Null butes would be very expensive :)

Yeah, plus if you solve this at all, you really want to support
arbitrary string lengths, just like JudySL.

Cheers,
Alan

------------------------------------------------------------------------------
Flow-based real-time traffic analytics software. Cisco certified tool.
Monitor traffic, SLAs, QoS, Medianet, WAAS etc. with NetFlow Analyzer
Customize your own dashboards, set traffic alerts and generate reports.
Network behavioral analysis & security monitoring. All-in-one tool.
http://pubads.g.doubleclick.net/gampad/clk?id=126839071&iu=/4140/ostg.clktrk
_______________________________________________
Judy-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/judy-devel

Reply via email to