Re: A universal ADT mapper and sorter?

john skaller Mon, 07 Feb 2011 15:16:07 -0800

On 08/02/2011, at 5:49 AM, Doug Baskins wrote:

> To All:
> 
> Thank you very much to all of you for you comments and suggestions.  I find 
> the
> idea of a good build system very appealing.  Especially for 2 reasons;  1)  
> It is 
> simply not my forte' ,  2) I think the person that has been doing that job 
> has 
> moved out of the HP Lab and I suspect he would like to pass the baton --
> HP does has issues with Version 3 LGPL.  I do not understand much of your
> talk about building methods.  I live on different planet.


My concept is that build systems are just programs like any other.

This means they need good structure.

And they need to be portable. This is hard since everyone's computer
is different, and even harder because some people develop code on
platform X but run it on platform Y.

The key thing here is for you to decide how to let others help.
Small patches to code can easily be handled by .. small patches.
Reorganising documentation and systematic code editing would be
easier with repository access.

SF+SVN is not a good tool for collaboration, it depends too much on
trust.

Git is a better tool. It's the tool developed by Linus to manage development
of the Linux kernel. Mercurial is a similar tool, it's used by Google.

Doug: you and Alan have to decide what to do here. If you use Git you don't
have to take any risks giving people write access to the repository, but the
downside is you will have to review and accept all the changes we make
actively.

Ideally, it would be better if someone is appointed repository manager
and they set up the system for you and tell YOU what you have to do :)

Erick may be happy to do that, not sure, certainly there's no problem
using the already built Felix version as base.



> 
> 1) An Index of arbitrary size in Bytes, Words, specifiable?

Doesn't JudyHS already  have that? .. An no, I see you mean like JudySL,
each key can be a different length.

> 2) An arbitrary length of Value in Bytes, Words?

There is a point where the cost of using a pointer to the heap is insignificant.
Also it depends if the value slot (address) is persistent or not.

If the value slots can move around, many objects, say, C++ objects,
have to be put on the heap, unless you want to upgrade Judy to using
C++ templates so the constructors/destructors can handle moving the
values around.

in C++ speak it is safe to "memcpy" something if it is a POD = Plain
Old Data type, which is C++ speak for "old fashioned C data structure" :)

On bytes/words: whatever is easier for you to implement. There are alignment
issues here: a 4 byte object may have to have an address which is a multiple
of 4.

> 4) Instead of returning a pointer to Value, the pointer would be to a struct 
> -- containing
>     the Value(s) and possibly the length of the Value area.  This means that 
> every Index
>     could possibly have a different size Value area.  This leaves the 
> possibilty of a 0 Value
>     area, with improved speed than with non-zero to a max of ?? 
> Bits/Bytes/Words?


> 4) The length of Value area would have to be specified in Create (static) or 
> Insert (dynamic)?

Typically, string keys can be variable length.

Values are usually either static length, or they're a Union (with a 
discriminant).

Ideally a C union value would only store the actual component, rather than 
allocate
the maximum size, however this is VERY HARD to do right, because of alignment
issues. It can't be done at all in either C99 or C++90 (without external 
configuration
data).

> 5) The sort method (Dictionary(lexicographical order) or Binary) would have 
> to be specified 
>      during JudyXXCreate()

This is getting harder. If you have a comparator function make sure it has an 
extra void*
parameter for client data:

        int compare (void *client_data,  void *x, void *y)

to allow "generic" routines to be written.

> 6) Endianess compensation parameter at XXCreate() if the Index is passed in 
> array elements of
>     1,2,4,8 byte elements.
>     I.E.  suppose the Index is 10 bytes, but passed as  "uint32_t Indx[20];" 
> , and passed like 
>     "JudyXXIns(PArray, Indx, 10)", or  "JudyXXIns(PArray, Indx, 4, 10);"  The 
> byte stream looks 
>     different with Little and Big Endian machine.   Judy needs to know how 
> many bytes to swap.  
>     The Values would stay native Endianess.  Also if the Index is Left or 
> Right justified?

One way around this is NOT to do it at all. You can specify the order, and 
leave it up to
the client to meet your requirements.

There is a well known canonical ordering: Internet Byte order (which is little 
endian I believe).

There are also various encodings. One popular one is 7 bits per byte with the 
high bit set to
zero on all bytes except the last one.

This is just like me saying: for JudyHS/JudySL I'm annoyed at not allowing 0 in 
the byte
stream. But the fact is I can convert any string with 0 in it to UTF-8 to get 
rid of the 0.

IMHO: you should pick the encoding that is easiest to implement and make the 
client
comply. If you start trying to guess all the possibilities here and handling 
them all
in Judy, you will get continues bugs and complaints that you're not handling X 
Y and Z :)

EG: the key is a struct .. woops, structs can have PADDING. Now it is quite 
hard to
specify the "byte order" because now you have to leave some bytes out.


> 7) The Sematics of XXIns() would need to know what to do if the size of Value 
> area is different that
>     a one that already exist.  Possibilities are stay same or grow?

Ouch. Interesting problem.

> 8) Error returns from passive (non-array altering) calls would return the 
> same as a 0 population array
>      (no error codes -- no passed error parameter)
> 9) Leading zero deletion when in Binary sort mode -- perhaps specifiable at 
> XXCreate()
> 10) I think the speed would be similar to JudyL + one Cache-fill for Value 
> areas less that
>       a cache-line in size (64bytes).

64 bytes is a reasonable maximum for a value. After that go back to 8 byte 
pointer to heap.

For keys it is probably different.

> 11) In the case of 0 length Value area, leave the return pointer to what?

NULL.

> 12) I am sure I have I forgotten something.

Of course, you were so involved in writing this email you forgot
to put the coffee on and the garbage out .. the cat is dying of hunger
and the dog ran off with your wife :)


--
john skaller
[email protected]





------------------------------------------------------------------------------
The ultimate all-in-one performance toolkit: Intel(R) Parallel Studio XE:
Pinpoint memory and threading errors before they happen.
Find and fix more than 250 security defects in the development cycle.
Locate bottlenecks in serial and parallel code that limit performance.
http://p.sf.net/sfu/intel-dev2devfeb
_______________________________________________
Judy-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/judy-devel

Re: A universal ADT mapper and sorter?

Reply via email to