from:"Jean\-Claude Wippler"

Re: [Metakit] c4_Property shouldn't require a virtual destructor?

2002-12-23 Thread Jean-Claude Wippler

Vittorio Digilio wrote:


Unfortunately it lacks (almost completely :-) ) a full C++ 
documentation (anybody, as a long-term user wrote down something and 
is willing to share, thanks :-) ), so I started experimenting and 
inspecting the C++ src code.

Does http://www.equi4.com/metakit/api/hierarchy.html help?


I noticed that c4_Property, though being the base class for the other 
properties, provides a non-virtual destructor.
[...]

In this scenario deleting the heap-allocated derived class shouldn't 
call the base class c4_Property::~c4_Property() destructor and the 
reference wouldn't be released.
 
I mean :
 
c4_Property *pMyInt=new c4_IntProp(age);
//
//
delete pMyInt; // c4_Property::~c4_Property() should'nt be called
// and Refs(-1) isn't called either
 
Perhaps I'm missing here something really big and the destructor 
should be non-virtual ?!

You mean should be virtual?

I don't know, but properties are not intended for the heap.  Why not 
simply:
	c4_IntProp pMyInt (age);

-jcw

___
metakit mailing list  -  [EMAIL PROTECTED]
http://www.equi4.com/mailman/listinfo/metakit

Re: [Metakit] mmap

2002-12-23 Thread Jean-Claude Wippler

Bruno Blondeau wrote:


Could someone tell me how to force changes to the disk when mmap is 
being
used by a Metakit database?

MK uses mmap in readonly mode.  Changes written to file during a commit 
are written to the underlying file.  The implementation for all I/O is 
concentrated in the c4_Strategy class, with c4_FileStrategy as the 
standard implementation of it.

-jcw

___
metakit mailing list  -  [EMAIL PROTECTED]
http://www.equi4.com/mailman/listinfo/metakit

Re: [Metakit] small borland change for metakit-2.4.8

2002-12-23 Thread Jean-Claude Wippler

Simon Cusack wrote:


The new 2.4.8 is great, I had to make a small change for borland to 
compile
it.

In src\univ.cpp I had change line 22 from:

#if !q4_MSVC  !q4_WATC  !(q4_MWCW  defined(_WIN32))

to :

#if !q4_BORC  !q4_MSVC  !q4_WATC  !(q4_MWCW  defined(_WIN32))

to build for borland builder 5 and 6.

Ok, thx - added to CVS.

-jcw

___
metakit mailing list  -  [EMAIL PROTECTED]
http://www.equi4.com/mailman/listinfo/metakit

Re: [Metakit] Case Sensitive Find/Select/Locate

2002-12-26 Thread Jean-Claude Wippler

Jeffrey Kay wrote:


How hard would it be to add another field type, say 's' (lowercase s)
for supporting case sensitive strings?  I've sort of run into a brick
wall with this -- I have some tables that the case-insensitive search 
is
fine, but others where I really need the case sensitivity.

If it were trivial, it would have been in there...

But you're right, this needs to be addressed.  There may be ways to get 
us there without globals (I agree, app-wide modality would be a bad 
choice).

One idea is to direct comparisons through the c4_Strategy class.  This 
is per-storage (but one could play tricks, and make the comparison code 
do different things only for some specified properties).  This should 
be doable with little risk and impact on the rest of MK, nor does it 
have to cost us in performance IMO.

Another option would be to add a comparison member (or function 
pointer) to each property object (c4_Property, and all its derived 
classes).  Again, no performance cost of substance IMO, but I'm not 
sure how far the effects of this would reach.

Encoding sort choices in type case (s vs S) would not be my 
favorite, because representation and use are really two different 
issues.  Even using special property names (name_nc:S vs name:S) 
would be preferable from my perspective, because it keeps this aspect 
ouf of the MK core.  If comparisons are moved to the strategy class, 
then one could implement this on top of MK - it may well become a 
default, but at least it would become overridable (for those who need 
to maintain 100% compatibility).

There is indeed no way to go from a view to its parent.  This is 
unfortunate, but impossible to alter in the current design (unattached 
subviews can be referenced from multiple items).

I wouldn't mind contributing the code if you can point me in the right
direction or give me a couple of hints about how you think this can be
accomplished.


Let me think about this.

The other reason to push forward on some sort of custom sorting, is 
that we really need to get Unicode-aware sorting worked out, which can 
be considered another custom sort order (probably the default one, 
one day).

-jcw

___
metakit mailing list  -  [EMAIL PROTECTED]
http://www.equi4.com/mailman/listinfo/metakit

[Metakit] libtool

2003-01-22 Thread Jean-Claude Wippler

I have a question...

For some reason I do not quite understand, MK builds shared libs with 
ld.  This completes and works as expected with C++ programs, but it 
causes runtime errors when loaded from a C main (for example Mk4tcl.so 
loaded from tclsh).  It may also breaks down even with C++ in Unix 
systems which do not support shared library back-linking.

The libtool 1.4.1 info docs sound ominous:

Writing libraries for C++
=

   Creating libraries of C++ code should be a fairly straightforward
process, because its object files differ from C ones in only three ways:

  1. Because of name mangling, C++ libraries are only usable by the C++
 compiler that created them.  This decision was made by the
 designers of C++ in order to protect users from conflicting
 implementations of features such as constructors, exception
 handling, and RTTI.

  2. On some systems, the C++ compiler must take special actions for the
 dynamic linker to run dynamic (i.e., run-time) initializers.  This
 means that we should not call `ld' directly to link such
 libraries, and we should use the C++ compiler instead.

  3. C++ compilers will link some Standard C++ library in by default,
 but libtool does not know which are these libraries, so it cannot
 even run the inter-library dependence analyzer to check how to
 link it in.  Therefore, running `ld' to link a C++ program or
 library is deemed to fail.  However, running the C++ compiler
 directly may lead to problems related with inter-library
 dependencies.

   The conclusion is that libtool is not ready for general use for C++
libraries.  You should avoid any global or static variable
initializations that would cause an initializer element is not
constant error if you compiled them with a standard C compiler.

   There are other ways of working around this problem, but they are
beyond the scope of this manual.

   Furthermore, you'd better find out, at configure time, what are the
C++ Standard libraries that the C++ compiler will link in by default,
and explicitly list them in the link command line.  Hopefully, in the
future, libtool will be able to do this job by itself.



My question is: would anyone have a suggestion how to deal with this in 
the most portable manner?  I tend to use MK mostly in static-linked 
form, but evidently it would be nice to make this work in the most 
general way possible.  The current CVS sources have a -lstdc++ added 
to LDFLAGS, which solves it for Linux, but generate the following 
output on MacOS X (both are gcc 3.1/3.2):

*** Warning: This library needs some functionality provided by -lstdc++.
*** I have the capability to make that library automatically link in 
when
*** you link to this library.  But I can only do this if you have a
*** shared version of the library, which you do not appear to have.
*** The inter-library dependencies that have been dropped here will be
*** automatically added whenever a program is linked with this library
*** or is declared to -dlopen it.
g++ -dynamiclib -flat_namespace -undefined suppress -o 
.libs/libmk4tcl.dylib  mk4tcl.lo mk4too.lo column.lo custom.lo 
derived.lo fileio.lo field.lo format.lo handler.lo persist.lo remap.lo 
std.lo store.lo string.lo table.lo univ.lo view.lo viewx.lo  -lc 
-install_name  /usr/local/lib/libmk4tcl.dylib

The resulting library does load in tclsh.

Should the conclusion be to throw out libtool altogether?  Frankly, I 
wouldn't mind one bit...

-jcw

___
metakit mailing list  -  [EMAIL PROTECTED]
http://www.equi4.com/mailman/listinfo/metakit

Re: [Metakit] Question on Views

2003-01-22 Thread Jean-Claude Wippler

Barbara Menzel wrote:


We are using MetaKit with Visual C++.  Often, we find there is a need 
to
initialize or perform some initial action on a view within a new class.
I've tried passing the view into the object via the constructor and get
several compile errors, primarily, c4_view is not a recognized type.

That's a typo, I assume: c4_View, not c4_view, right?


However, passing a view as a parameter in a member function, there are 
no
errors and everything works fine.  The view can even be updated within 
the
member function and returned with the updates included.  Has anyone 
tried
this or something similar with any success?

Could you post a brief extract of the code you would like to get 
working?  I'm having trouble understanding exactly what part is not 
doing what you expect.

-jcw

___
metakit mailing list  -  [EMAIL PROTECTED]
http://www.equi4.com/mailman/listinfo/metakit

Re: [Metakit] Q: rowids?

2003-02-06 Thread Jean-Claude Wippler

Gordon McMillan wrote:


Why sort it? Scan on open for the maxid, and
maintain that in memory. It's lower overhead, and
doesn't interfere with whatever ordering the app
might want to see or maintain.


Worth repeating, because it highlights a fundamental aspect of MK's 
column-wise data storage model.  You can open a datafile of any size, 
point at a view with 10s of thousands of rows of any complexity, and 
still do the above scan-on-open with no other overhead than one read of 
a few Kb off the disk.  This is to efficient, that any other approach 
is a waste of effort in most cases.

Column-wise data storage means a scan over one property is f a s t.

It would be even faster if MK had C-coded loop aggregate functions such 
as max, but hey - there need to be some goodies saved up for later :)

One last point on this.  This sort of tight max-scan loop takes maximum 
advantage of CPU caches (even more so if coded in C).  On modern CPU's, 
that equates to warp drive.

-jcw

___
metakit mailing list  -  [EMAIL PROTECTED]
http://www.equi4.com/mailman/listinfo/metakit

Re: [Metakit] Tk with tclkit problem?

2003-02-19 Thread Jean-Claude Wippler

Lok Yek Soon wrote:


I encounter the following problem when testing Tk with tclkit under 
Linux (eg. ./tclkit hello.tcl)

Error message as follows:
==
invalid command name wm
while executing
wm title . Hello
(file hello.tcl line 2)


Add the following line before calling wm:
	package require Tk

Please post tclkit-related Q's to the starkit mailing list instead:
	http://www.equi4.com/mailman/listinfo/starkit

-jcw

___
metakit mailing list  -  [EMAIL PROTECTED]
http://www.equi4.com/mailman/listinfo/metakit

[Metakit] Metakit 2.4.9 released

2003-02-20 Thread Jean-Claude Wippler

There is a new release of Metakit.  It's mostly a bug fix release, plus 
some smaller changes to extend the Tcl binding a bit more on the OO 
side.

Extract from changelog at http://www.equi4.com/metakit/CHANGES:

2003-02-19###MK 2.4.9
2003-02-18Fix bug in blocked view delete and hash byteorder
2003-02-17Configure tweaks for hpux/ia64
2003-02-14Bug found in blocked viewer modification
2003-02-14Some changes to OO interface in Tcl
2003-02-14Enable stdio buffering
2003-02-07Tweaks to restore broken MK ports
2003-02-07Changed code to avoid compiler warning
2003-02-02Work around optimizer bug in gcc 3.2.1
2003-01-24Fixed cleanup order bug in Mk4tcl
2003-01-22Add missing -lstdc++
2003-01-19Tweak to temp object use
2003-01-17Add synonym for mk4tcl info command
2003-01-16Allow access to root view in Mk4tcl
2003-01-15Use strdup
2003-01-10Build improvements, Mk4py long and Mac improvements
2003-01-09String compare tweak, Mac Carbon runtime mmap code
2002-12-23Tweak for Borland builder 5  6
2002-12-09Fixed bug in selection view change propagation
2002-12-02Fixed bug in MK old-file format conversion
2002-11-24Fixed Mk4tcl threaded build
2002-11-22Configure tweak for HPUX/Itanium
2002-11-16Tweaks to compile on Mac
2002-11-04Fixed typo in Makefile
2002-11-03###MK 2.4.8

The homepage is, as before, http://www.equi4.com/metakit - FYI, Andreas 
Kupries has documented the 2.3/2.4 file format, see doc section.

FYI also, there is now a bug tracking system at 
http://www.equi4.com/bugs

The 2.4.9 release passes all its 140+ tests on Windows, Linux, and 
MacOS X - but there probably remain some portability issues in the 
makefiles and headers.

Metakit is Open Source Software, and will always remain so.  If you 
would like to support this, please share bug details, tricks/insights, 
and porting tips - it's a very effective way to help take it yet 
further.

Happy programming :)

-jcw

___
metakit mailing list  -  [EMAIL PROTECTED]
http://www.equi4.com/mailman/listinfo/metakit

Re: [Metakit] MetaKit java bindings?

2003-04-02 Thread Jean-Claude Wippler

Michael Scharf wrote:

I switched my project from delphi/python to java. And I still beleive
that MetaKit is the best database for my application. Now I wonder
if anybody has written some JNI code to access MetaKit from java.
Whee... a trans-lingual MK afeccionado!  :)

It depends on what you want to do.  An experiment which was done about 
a year ago (by Christian Tismer), was to generate a *fixed* binding, 
given an existing database structure.  Changes to the schema means you 
have to regenerate and recompile the wrapper.  The reason to mention 
this, is that the binding is C, not C++ (though inside, is of course 
still a C++ core for now).  That means SWIG should have no trouble at 
all wrapping it to various languages, including Java.  This was in fact 
one reason for doing it.

As far as I can remember it was definitely functional code, though not 
exposing most of the MK view operators, just basic access/modify 
functionality for views and rows.  The project was shelved, to await 
better focus and actual need.  In case you're interested - it's all 
available in a CVS project on equi4.com (follow same checkout 
instructions as metakit, but use metable as module name).

It would be fantastic if you can make such a binding work for Java, in 
some form or other (metable is just food for thought, there are of 
course many more ways to go about this).  It also matches my conviction 
that data storage has longer lifetimes than languages, i.e. over time I 
expect an increasing interest to bind to a second language (as teams 
and projects evolve).

It's been some time since I mentioned it, so let me re-iterate that I 
continue to be interested in making more language bindings happen, also 
to Perl, Ruby, etc.  My main hurdle is not knowing enough about each of 
the respective languages to be able to do things in a natural way 
myself.  But I'll definitely do my best to help where I can.

-jcw

___
metakit mailing list  -  [EMAIL PROTECTED]
http://www.equi4.com/mailman/listinfo/metakit

Re: [Metakit] Redundant data

2003-05-29 Thread Jean-Claude Wippler

Angus Lord wrote:

I'm using metakit partly as structured storage for my app. Setting up 
the
database is fine, however if I modify one of the entries, then I get a 
copy
of the data stored in the file (only once, I can modify it many 
times). This
is fine for small bits of data, but if I am storing and modifying 
files (a
few kb) then I am potentially wasting a lot of space. Is there any way 
to
turn this feature off?
You're seeing the consequences of stable storage - the mechanism 
which ensures commit/rollback robustness.

Comes with Metakit, which is a database manager that will continue to 
function with a consistent dataset regardless of aborts and pulling 
plugs at the most awkward times.

You can compress, by saving to a new file and switching over to it (see 
SaveTo).

The space is not wasted or lost.  It gets re-used later.

-jcw

___
metakit mailing list  -  [EMAIL PROTECTED]
http://www.equi4.com/mailman/listinfo/metakit

Re: [Metakit] Installing Mk4tcl on Red Hat Linux 8.0

2003-05-30 Thread Jean-Claude Wippler

John Fletcher wrote:

Set $auto_path accordingly.
What does this do?
Standard Tcl - please google for it or look at man pages.

Is there any way in which the installation can do the building of the
package index file?  If not, then the demos and tests, which use
package require will all fail, unless instructions for how to get
round this are given as well.
Yes, the pkgIndex.tcl file gets created by make install, see the 
makefile for how/where it does that.  The trouble right now is that 
something is broken in the 2.4.9.2 autoconf/libtool config (help!).  I 
probably should not have followed someone's recent advice (in private 
email) to update to a newer autoconf/libtool combo :(

Alternatively, there could be some tests which use load instead.
$ cd ../tcl/test
$ tclsh all.tcl
Processing 9 scripts...
  mk1basic
  mk2chan
  mk3struct
  mk4commit
  mk5object
  mk6fixed
  mk7limit
  mk8fail
  mk9crash
Passed 33 tests
$
Incidentally, I did a build of 2.4.9.2 and I noticed that the
libmk4tcl.so and Mk4tcl.so files are not the same size, so it is not
just a case of renaming, as was said in the readme for 2.4.7.  I
eventually found the built libaries in the hidden folder builds/.lib
$ ls -l .libs/libmk4tcl.so
-rwxr-xr-x1 jcw  users 1189052 May 29 16:09 
.libs/libmk4tcl.so
$ ls -l Mk4tcl.so
-rwxr-xr-x1 jcw  users  340468 May 29 16:09 Mk4tcl.so
$ cp -a .libs/libmk4tcl.so blah
$ strip blah
$ ls -l blah
-rwxr-xr-x1 jcw  users  340468 May 29 16:25 blah
$

-jcw

___
metakit mailing list  -  [EMAIL PROTECTED]
http://www.equi4.com/mailman/listinfo/metakit

Re: [Metakit] corrupt database

2003-06-26 Thread Jean-Claude Wippler

Guenther Fischer wrote:

On Tue, 24 Jun 2003, Jacob Levy wrote:
You (and we) need a little more information:
[...]
It is a starpack with the latest version of tclkit - the windows 
version
is build on linux. tclkit is the build fron equi4.com.
For issues regarding starpacks and tclkit, it is probably more 
effective to post to the starkit mailing list, see:
	http://www.equi4.com/mailman/listinfo/starkit

The error comes only with this one DB created with my application (a 
wine
databes programm). There are many other users - I never see it before. 
I
user tclkit/metakit for some years for this project (free software). I
think there are some bad data in the database (disk error or what ever)
and this data are needed for indexing or so.
The one unexplained problem on Windows, and it might even be a 
regression from previous releases, is a reported corruption when the 
datafile is on a file server.  So this should definitely be something 
to find out.

Every other case I know of was caused by opening more than once.

The bad news: datafile corruption in MK tends to damage real bad.  It 
usually does not damage records but entire *columns*.  So when trying 
to extract data, your best bet is to try to not extract all properties.

Quick things to try:
- does sdx mkinfo datafile give meaningful details?
- also see the mk2tcl starkit on http://mini.net/sdarchive/
- sometimes readkit.tcl can read what MK itself cannot
-jcw

___
metakit mailing list  -  [EMAIL PROTECTED]
http://www.equi4.com/mailman/listinfo/metakit

Re: [Metakit] Using head of file

2003-09-02 Thread Jean-Claude Wippler

David Van Maren wrote:

I noticed that constructing a (modifiable) storage from an existing 
non-metakit file succeeds, and allows metakit information to be 
written to it with no errors.  Examining the file afterwards, I've 
seen that the metakit information is simply appended to the original 
file data, leaving it intact.

I looked at some of the format documentation, and it indicated that 
there is both a header and a footer used by metakit.  I'm guessing 
that the above behavior was by design, in order to allow users to use 
the head of the file for non-metakit information (such as their own 
magic number or similar information).  But that's just a guess, so 
I've got a few questions:

1. Is this behavior intentional?
Yes.  It's used in the Tcl scripting language to implement Starkits and 
Starpacks: scripts and executables which can be launched and contain a 
MK datafile, piggy-back style.  In the case of Starkits, the header is 
a regular Tcl script (Tcl stops reading at a CTRL/Z in the file).

2. If not, shouldn't metakit fail construction of a Storage from an 
existing non-metakit file?
I agree that this append behavior can confuse things.  One way to check 
is to open read-only, and check the description string of the storage 
contents.  I should come out as empty.

3a. If it is intentional, does metakit guarantee that it will leave 
the head of the file unchanged, even through a commit() which changes 
the metakit contents?
Yes.

3b. Does metakit care if I subsequently modify the head of the file 
(after closing the Storage associated with it)?
No.  No need to close - MK ignores header data.

3c. Does metakit care if the head of the file is grown or shrunk so 
long as the Storage is closed?
No.  The tail markers use relative sizes - here, closed is essential.

We're wanting to mark each storage file with our own special magic 
number, and this looks like a very easy way to do it, if metakit 
supports it.
Yep.

-jcw

___
metakit mailing list  -  [EMAIL PROTECTED]
http://www.equi4.com/mailman/listinfo/metakit

Re: [Metakit] How long to wait before commiting?

2003-09-03 Thread Jean-Claude Wippler

Brian Kelley wrote:

I am loading a database with a bunch of new data that the user is 
allowed to validate.

I have been using commit() and rollback() for these operations because 
it's easy :)  The question I have is, what are the ramifications of 
loading a lot of data without commiting?  Memory?  Speed?  Inquiring 
minds want to know.
Memory usage grows until the commit, as more and more view changes are 
buffered.

For classical data entry, i.e. typing, I would assume that speed is 
never an issue, nor are the amounts of data, i.e. memory usage...

-jcw

___
metakit mailing list  -  [EMAIL PROTECTED]
http://www.equi4.com/mailman/listinfo/metakit

Re: [Metakit] Do loaded rows hang around in memory?

2003-09-15 Thread Jean-Claude Wippler

Erik Hermansen wrote:

[trouble]
The only pattern I can see is that the corruption usually occurs when 
the user exits the application.
If you have AutoCommit() enabled, then that may be related - it'll 
commit before exit.

Can you copy away the file

The commit buffers are also memory-mapped files?  In which case they 
couldn't be corrupted by stray writes from my application code, right?
No - commit buffers are allocated.  They, and administrative info, can 
still be corrupted.  To rule this out requires running in a separate 
address space, i.e. process.

Is there the possibility of exiting before writes performed in a 
commit are finished?  The question sounds dumb to me, but I am 
grasping at straws because I've already tried so many things.
If writes do not finish, the last step of commit is not done, i.e. the 
file will not be adjusted to use the new state.  Any premature 
exit/end/crash leaves original state intact.

I'm saying this under the assumption that there is no bug in MK.  If 
there is, I hope we can find it and resolve it ASAP!

The only other hint I have is that the bug was never reported until I 
split my database into three separate storage files.
Three different files, c4_Strategy objects, and c4_Storage objects?  
Should be no problem.

There is no longer 100% consistency between the three, i.e. you may see 
one commit succeed and another fail (e.g. disk full).  But none of this 
can damage datafiles IMO.

Another programmer is playing around with the order we perform commits 
and delete storage objects during the application exit, but each tweak 
takes about a week to confirm whether it did anything or not.  This is 
the final release-stopping bug after a year of development.
Is it an idea to record  play back the changes to force the problem to 
the surface?  A drastic method would be to instrument all modifying 
calls to write out a set of instructions, perhaps as a Python or Tcl 
script.

Are you using threads?  Are you 100% confident of the stability of the 
compiler and runtime library?  I'm old enough to know not to point 
fingers at anyone but myself, but it doesn't hurt to rule out the 
obvious one more time...

-jcw

___
metakit mailing list  -  [EMAIL PROTECTED]
http://www.equi4.com/mailman/listinfo/metakit

Re: [Metakit] hash table size in database

2003-09-16 Thread Jean-Claude Wippler

Kristian G. Kvilekval wrote:

I am using a hash view on a table of about 4K elements.

Today I examined the database with the dump utility
and noticed that the hash view sometimes has 4K elements and
other times it has 8K..
What determines the size of the hash table?
It's a power of two, and it's always larger than the number of data 
rows.  I've written a bit more about hashed and blocked views on this 
new page:
	http://www.equi4.com/mkmapping.html

-jcw

___
metakit mailing list  -  [EMAIL PROTECTED]
http://www.equi4.com/mailman/listinfo/metakit

Re: [Metakit] hash table size in database

2003-09-16 Thread Jean-Claude Wippler

Kristian G. Kvilekval wrote:

Hmm... that's exactly what prompted the question.  I have a
database with 4030 entries, but one machine it generates
a hash with 4096 and the other with 8192.. Is it checking
whether the database fits in memory?
Just infinitely curious:


Machine 1:   512KB ram
Database Sz: 673725
 mk4dump ~/.zinf/db/metadb | fgrep VIEW
 VIEW 1 rows = dbview:V dbview_H1:V
   VIEW  4030 rows = url:S type:S title:S artist:S album:S genre:S
comment:S track:S year:S
   VIEW  4097 rows = _H:I _R:I
--
Machine 2:   2GB ram
Database sz : 689683
 VIEW 1 rows = dbview:V dbview_H1:V
   VIEW  4030 rows = url:S type:S title:S artist:S album:S genre:S
comment:S track:S year:S
   VIEW  8193 rows = _H:I _R:I
No, fits in memory is not considered.

What does matter is the order or adds/deletes.  Space is reclaimed and 
re-used, when fill drops too low - there is hysteresis, i.e. the same 
number of rows can have a different hash table size depending on how 
entries were added and deleted.

Same build order, different platform?  (if so, there could be a bug)

-jcw

___
metakit mailing list  -  [EMAIL PROTECTED]
http://www.equi4.com/mailman/listinfo/metakit

Re: [Metakit] Mk4py behavioral questions [was Re: Mk4Py bug?]

2003-09-17 Thread Jean-Claude Wippler

Gordon McMillan wrote:

On 17 Sep 2003 at 1:25, Nicholas Riley wrote:

[...]

I'm working under the assumption that, given a database, you'd
prefer it to generate an error rather than discard data.
Not at all. When working with GUI forms, I often use a form dict
which may well have extra state I don't want persisted.
Temporary properties, is what I usually call 'em...

Yes, they are in fact extremely useful - you can have a view with props 
a, b, c - then open it with getas a,b,d, then have c still linger 
around, copy c's to d's, say to convert, then commit.  The result is a 
view with a,b,d.  Properties which are restructured away like this, and 
properties not in the getas are temp props - they disappear on commit 
(and rollback).

I think you can even add a prop, and do a getas after-the-fact to make 
it persist.

Also, properties which do not persist offer a way to cache additional 
info for each row.

-jcw

___
metakit mailing list  -  [EMAIL PROTECTED]
http://www.equi4.com/mailman/listinfo/metakit

Re: [Metakit] beginners hash view

2003-09-18 Thread Jean-Claude Wippler

Riccardo Cohen wrote:

I built a view of 50k records that I need to access. With normal view 
it is too slow (180 ms), so I try with hash view that I never used, 
and the result is even slower !(250 ms) !
(I've just read the new page http://www.equi4.com/mkmapping.html)

Here is what I've done :

  c4_View   view=db.GetAs(table[key:S,val:S]),selection;
  c4_View   viewsec=db.GetAs(sec[_H:I,_R:I]);
  c4_View   viewhash=view.Hash(viewsec,1);
  c4_Rowrow,searchrow;
  c4_StringProp val(val);
  c4_StringProp key(key);
  for (idx=0;idxTOTAL;idx++)
  {
sprintf(st1,%d%d%d%d,idx,idx,idx,idx);
sprintf(st2,%d%d%d%d,idx,idx,idx,idx);
key(row)=st1;
val(row)=st2;
view.Add(row);
No!  Do not touch view when there's a hash mapping on top.  Use 
viewhash:
  viewhash.Add(row)

  }
  db.Commit();
  key(searchrow)=;
  selection=viewhash.Select(searchrow);
what's wrong ?? is there any sample code ?
What are you measuring - total time of the above code?  Yes, that will 
be slower, it is now also setting up hashes along the way.  But the 
select should be instant.

One way to avoid the above confusion, it to rewrite your code a bit 
more and use:
	view=view.Hash(viewsec,1);
at the top, replacing the viewhash declaration.

In other words, hide the original view once you've set up a hash on it.

-jcw

___
metakit mailing list  -  [EMAIL PROTECTED]
http://www.equi4.com/mailman/listinfo/metakit

Re: [Metakit] beginners hash view

2003-09-18 Thread Jean-Claude Wippler

Riccardo Cohen wrote:

Thanks for your quick answer.
I did try to use only viewhash, for adding. But it did not change.
Weird... try using Find() instead of Select()?

-jcw

___
metakit mailing list  -  [EMAIL PROTECTED]
http://www.equi4.com/mailman/listinfo/metakit

[Metakit] Trends (was: Re: beginners hash view)

2003-09-18 Thread Jean-Claude Wippler

Riccardo Cohen wrote:

  c4_View   view=db.GetAs(table[key:S,val:S]),selection;
  c4_View   viewsec=db.GetAs(sec[_H:I,_R:I]);
  c4_View   viewhash=view.Hash(viewsec,1);
[...]
what's wrong ?? is there any sample code ?
To follow up on this - the demo/ and examples/ subdirs in the MK source 
distribution have some sample code, for C++, Python, and Tcl.

For C++, there are 140+ little self-contained tests in the tests/ 
regression test suite.  They may not be perfect examples, but they are 
very small and self-contained, and definitely a good spot to look for 
uses of all of the different view operators.

I don't want to discourage people.  On the contrary.  But I'm juggling 
time between a number of activities.  I've been working on some Really 
Exciting Technology To Take Metakit Way Further (TM) for some time now. 
 So my efforts to help and support and improve docs are going to be 
limited, while maintaining my long-term commitment to help resolve and 
fix bugs.

As you may have seen, the www.equi4.com website has recently gotten an 
overhaul, in an attempt to make things easier to find.  I've started 
writing up some more pages in response to questions on this list.  And 
I've just finished a basic utility to display low-level stats and 
verify free-space integrity of MK datafiles, see 
http://www.equi4.com/mkstats.html

A number of people have sent a donation lately (thank you!), and Apple 
Computer has recently rewarded the fact that MK is doing well for them 
in every release of MacOSX all the way to Panther by donating a 17 
Powerbook (whee!), so I can't even start to tell you how motivated I am 
to take the revolution of Metakit further.  I'm saying this to let you 
know that although a mailing list like this is usually about Q's and 
problems, there really are many things going pretty well these days.  
The one constraint seems to be my time (and a more fanatic focus).

If you want to help, consider writing a small piece about some aspect 
of Metakit - as a good deed on some rainy day, perhaps.  You can either 
do it all yourself, put it on your website and announce it so I can 
point to it, or enter it as a page in the MK wiki at 
http://www.equi4.com/metakit/wiki.cgi - or you can email me and I'll go 
out of my way to set up a new page and integrate it with what's on the 
website already (with full credits and acknowledgement).

The other way to help, and it's really encouraging to see it happen 
more and more, is to participate and help out with questions on this 
mailing list.

Happy coding, may Metakit serve everyone really well :)
-jcw
___
metakit mailing list  -  [EMAIL PROTECTED]
http://www.equi4.com/mailman/listinfo/metakit

Re: [Metakit] beginners hash view

2003-09-18 Thread Jean-Claude Wippler

Riccardo Cohen wrote:

It worked fine with Find() ;)

Then it comes two questions :

1) Is it normal that select does not use hash ?
2) If I do a SelectRange(), will it use hash ?
It seems you've just answered your own Q's.  Think about it: hashing is 
by value, not sorted.  So select, which uses selectrange to optimize, 
will not be able to use it - I haven't checked the source, but when you 
do I'm pretty sure that's what you'll find out.

-jcw

___
metakit mailing list  -  [EMAIL PROTECTED]
http://www.equi4.com/mailman/listinfo/metakit

Re: [Metakit] Ordered and Hash Views together!

2003-09-18 Thread Jean-Claude Wippler

Brian Kelley wrote:

I have a database table
table[id:I,name:S]
that I would like to find quickly using either id or name.  Is it 
possible to have two hash views simultaneously on this table?
This Q was bound to come up one day ;)

vw = st.getas(table[id:I,name:S])
dvw = vw.project(vw.name, vw.id)
_hashview = st.getas(table_hash[_H:I,_R:I])
_hashview2 = st.getas(table2_hash[_H:I,_R:I])
vw2 = vw.hash(_hashview, 1)
dvw2 = dvw.hash(_hashview2, 1)
I use vw2 to quickly find the id properties and dvw2 to quickly find 
the name property.  It appears to work (the full test is below) 
which is pretty amazing.  I can add data to vw2 and dvw2 is 
automagically updated.

Is this safe and proper?
I am not sure.  I suspect dvw2 being updated is not quite right - try 
changing en existing item in vw2 so its key stays the same, but the 
name is altered... (my hunch is that dvw2 triggers a full rehash upon 
seeing a size change in its underlying data view)

The trick with hash views is that they must see all changes, so they 
can update the secondary info, while essentially passing through the 
request to the underlying data view.

You might have to change the above so one hash is built on top of the 
other:

vw = st.getas(table[id:I,name:S])
vw2 = vw.hash(_hashview, 1)
dvw = vw2.project(vw.name, vw.id)
dvw2 = dvw.hash(_hashview2, 1)
And then always make changes *only* through dvw2, which will cascade 
changes through vw2.

Do tread with caution - these are cases where I have not done any 
testing at all... there is definitely considerable room for 
optimization (internally) in this sort of stacked use which is *not* 
being done at all in the current implementation.

-jcw

___
metakit mailing list  -  [EMAIL PROTECTED]
http://www.equi4.com/mailman/listinfo/metakit

Re: [Metakit] indexed view

2003-09-19 Thread Jean-Claude Wippler

Riccardo Cohen wrote:

While looking at indexed view in view.cpp, I read the header of hash 
view, and noticed a small mistake :
[...]
 *  c4_View datah = storage.GetAs(people_H1[name:S,age:I]);
[...]
while the text above speaks about the secondary view [_H:I,_R:I]
(Must be an old version)
Good catch, will be fixed in next checkin.  Thx.

About indexed view, it is written :
 * This view is modifiable.  Careful: when a row is changed in such a 
way
 * that its key is the same as in another row, that other row will be
 * deleted from the view.

So this supposes that multiple key is supported for adding, but not 
for updating. Is that true ?
The catch is that modifying a key property is not unlike deleting a row 
and adding another one.  That leads to some hairy details which I have 
not even thoguth through well enough...

By the way, there is no c++ example of indexed view. I cant find what 
to put in arguments const c4_View map_ and  const c4_View props_
I'm not sure.  Indexed views are experimental at best, right now.  I'm 
not too pleased with what's there, and would suggest staying away from 
it - it's not ready for real use IMO.

Even ordered views have some blind spots - it's all due to an 
unfinished design of how implicit key ordering and explicit row# 
indexing should work together.  And there's things like allowing 
duplicate keys or not, and most important of all: all ordering is going 
to be somewhat limited until MK supports custom comparisons (for UTF-8, 
case-sensitivity on/off, reverse ordering, etc).

You're reaching limits of the current implementation.  Some of this is 
simply unfinished, but some of it hinges on deeper issues which are 
taking a lot more time to understand and resolve than I originally 
assumed.  The view model is very generic, the inherent ordering side 
of things needs more details about views to be managed before all 
operators can be done properly.

Note that you can always take over, and derive a new custom viewer 
class for a certain purpose.  What a custom viewer lets you do is 
intercept all access and changes, and manage extra details in secondary 
view, etc.  I should probably document more of that generic (and *very* 
powerful) mechanism, rather than try to list all rough spots in indexed 
viewers and such.  Custom viewers are the foundation for all the newer 
view operators in MK, including hashes, blocking, joins, groupby, remap.

-jcw

___
metakit mailing list  -  [EMAIL PROTECTED]
http://www.equi4.com/mailman/listinfo/metakit

[Metakit] Bugs, gaps, and suggestions

2003-09-19 Thread Jean-Claude Wippler

There's a new page on the website which I'm going to use to collect 
issues which do not fit the bug tracking system well enough, as well as 
more open-ended ideas and suggestions for extending and improving 
Metakit.

The new feature/to-do page is at
http://www.equi4.com/mktodo.html
The bug tracking system is as before at
http://www.equi4.com/bugs
The list is fresh and totally incomplete.  Please send suggestions and 
reminders.  I'd like to prevent good ones from falling through the 
cracks, and also maintain a good overview so important things can be 
done first.

-jcw

___
metakit mailing list  -  [EMAIL PROTECTED]
http://www.equi4.com/mailman/listinfo/metakit

Re: [Metakit] metakit for python 2.3 on freebsd

2003-09-20 Thread Jean-Claude Wippler

PieterB wrote:

I'm trying to install metakit with python 2.3 on freebsd 5.1.  When
I run make METAKIT_WITH_PYTHON=yes from /usr/ports/databases/metakit
(and changing python2.2 to python2.3 in the Makefile).
Not sure how the FreeBSD ports system is setup with MK, so I can't 
comment on it.  In general, make's for the python side of MK could use 
some tweaks, judging from a couple of recent posts.

Here's perhaps an option for you: if you checkout from CVS, you'll find 
a new distutils solution added by Gordon McMillan:
	cd python  python setup.py build   (or install)

Now that distutils is supported, being the Python way nowadays, 
should we perhaps move away from makes and all the autoconf/libtool 
complexities?  When in Rome, etc...

-jcw

___
metakit mailing list  -  [EMAIL PROTECTED]
http://www.equi4.com/mailman/listinfo/metakit

[Metakit] gcc 3.3 on mac

2003-09-20 Thread Jean-Claude Wippler

FYI, the MK build problems with gcc 3.3 on OS X are resolved by getting 
the latest gcc update from Apple (Aug 2003):

This one is bonkers:

$ gcc --version
gcc (GCC) 3.3 20030304 (Apple Computer, Inc. build 1435)
Copyright (C) 2002 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is 
NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR 
PURPOSE.

This one is sane:

$ gcc --version
gcc (GCC) 3.3 20030304 (Apple Computer, Inc. build 1493)
Copyright (C) 2002 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is 
NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR 
PURPOSE.

No changes to MK.  Builds without warnings.  Test suite passes cleanly.

-jcw

PS.  Unrelated, but FYI: I've regenerated autoconf 2.57 / libtool 
1.4.3, see CVS.

___
metakit mailing list  -  [EMAIL PROTECTED]
http://www.equi4.com/mailman/listinfo/metakit

[Metakit] test - please ignore

2003-09-25 Thread Jean-Claude Wippler

(I'm fiddling with Mailman mailing list settings)

-jcw

___
metakit mailing list  -  [EMAIL PROTECTED]
http://www.equi4.com/mailman/listinfo/metakit

Re: [Metakit] S vs B datatypes in Python

2003-09-25 Thread Jean-Claude Wippler

[EMAIL PROTECTED] wrote:

Is there in fact, for Python, a difference between using S and using B?
There are some semantic differences between C/C++, Python, and Tcl - so 
there are always going to be some slight impedance mismatches between 
them for Metakit.

In C, S-properties are zero-terminated strings, while B's are (sized) 
byte buffers.

In Python and Tcl, the distinction is far smaller.  The sort order of 
S's is done with stricmp (case insensitive textual comparison), while 
it is memcmp for B's.

If your data may contain null bytes, you must use B's.

If your data is text, use S's to get a decent sort order (Unicode and 
UTF-8 issues will also play a role for S's one day).

-jcw

___
metakit mailing list  -  [EMAIL PROTECTED]
http://www.equi4.com/mailman/listinfo/metakit

Re: [Metakit] Mk4py storage lifetime

2003-09-26 Thread Jean-Claude Wippler

Nicholas Riley wrote:

I'm almost finished with my Mk4py work, finally.  Down from about 20
items on my to-do list to two, at least. :-)
Wow, great!

Anyway, one more question.  Should this work?

metakit.storage().getas('blah[x:S,y:I]').structure()
[]

Because this does:

s = metakit.storage()
s.getas('blah[x:S,y:I]').structure()
[Property('S', 'x'), Property('I', 'y')]
This is correct behavior in the current design.  Storages are not kept 
open by views.  Their cleanup causes all views associated with them to 
become empty.

Should access to orphaned views throw an exception?
That would indeed be another way to treat this (and probably useful to 
help detect incorrect use), but it would require some redesign of the 
C++ core.

-jcw

___
metakit mailing list  -  [EMAIL PROTECTED]
http://www.equi4.com/mailman/listinfo/metakit

Re: [Metakit] Mk4py changes

2003-09-27 Thread Jean-Claude Wippler

Nicholas,

Thanks for this great contribution.

I'll go through this to understand all the changes.  In the meantime. 
I'm attaching a diff with the latest code in CVS, it wasn't so hard 
after all, and after editing out the differences reported for 
auto-generated stuff such as configure, it becomes a lot easier to read 
through all your changes.

[...]
That's it.  I hope these changes will make it easier for people new to
Mk4py to get up and running, instead of being mired in compilation and
usage problems.
Thanks again - I *very* much share your concerns and appreciate your 
efforts to improve that side of things.  Deployment hassles are the 
worst time-wasters ever IMO, it's worth spending all our time on (but 
only a few people, that is), to get better out-of-the-box solutions!

-jcw



njrdiffs.out.gz
Description: GNU Zip compressed data

Re: [Metakit] find select and search, help once again please

2003-09-29 Thread Jean-Claude Wippler

Riccardo Cohen wrote:

Search()

= many errors like :
error search for key '836 my key 836' found idx 15000
this key is in the table, I see it with kitviewer, 3 records have this 
key.

= when found, the value is sometimes not the good one :
idx=498, key='195 my key 195', foundidx=5850, val='value for key 1950 
[dum=5850]', avg=0.188377

it should be the value for key 195, not for key 1950 !

= If I increase the TOTAL to 1 instead of 5000, then every search 
is found with no error !

= the search is very quick, but does not provide the result !

what does it do exactly ???
Binary search.  It can only work if the view is sorted on the key, and 
the key is the first property.

Find()
==
result ok, but quite slow.
If I hash my table, the result is very quick, but I cant have multiple 
key ! (which is a problem for me)
You could consider grouping first on the (non-unique) key, and then 
hashing the resulting view/subview structure?

I dont need the performance of a Cray II running an Oracle Server, but 
62ms per selection is too much for me (it takes 6 seconds for 100 
searches !). Could anybody help me please ?
Have you tried a plain brute force loop?  If it still isn't near the 
performance you'd expect, please post a (short) code example.  There 
are a few things to avoid unnecessary copying.

I don't think 62ms to go through 10k rows or so is accurate, there must 
be something else going on...

-jcw

___
metakit mailing list  -  [EMAIL PROTECTED]
http://www.equi4.com/mailman/listinfo/metakit

Re: [Metakit] Mysteriously growing Metakit db

2003-09-30 Thread Jean-Claude Wippler

[EMAIL PROTECTED] wrote:

[...]
I am now fairly confident that my database will not grow without bound,
consuming life as we know it on the east coast.
Relieved :)

But what is commit-extend mode anyway?
I've dug up information about this in the Metakit wiki and used it as 
basis for a new page on the website, see the last item of 
http://www.equi4.com/mkdocs.html

-jcw

___
metakit mailing list  -  [EMAIL PROTECTED]
http://www.equi4.com/mailman/listinfo/metakit

Re: [Metakit] Mk4Py for python2.3 on Windows

2003-09-30 Thread Jean-Claude Wippler

Thorsten Henninger wrote:

I am trying to compile the Mk4py.dll (python bindings) for Python2.3 
on Windows, but I did not succeed.  I almost got it withthe mingw 
cross compiler on windows, but this one does not work as well!
There was an issue with 64 bit ints (PWONumber.h and PyRowRef.cpp, 
addressed by njr's patch), but after that change - and the inevitable 
adjustment of python22-python23 in the MSVC6 build project - it builds 
ok with MSVC6.

Mk4py.dll /pub/mk/mk-2.4.9-windows/Mk4py.dll
I've renamed the previous build to Mk4py22.dll, and have uploaded a new 
Mk4py.dll - enjoy :)

-jcw

___
metakit mailing list  -  [EMAIL PROTECTED]
http://www.equi4.com/mailman/listinfo/metakit

Re: [Metakit] Mk4py changes

2003-09-30 Thread Jean-Claude Wippler

Mikhail,

did you change any files outside of mk4py distribution?
FYI, see 
http://trixie.triqs.com/pipermail/metakit/2003-September/001409.html - 
I posted a patch, i.e. all the differences in one file.  You can see 
exactly what Nicholas did.

-jcw

___
metakit mailing list  -  [EMAIL PROTECTED]
http://www.equi4.com/mailman/listinfo/metakit

Re: [Metakit] BSDDB vs. Metakit performance?

2003-10-10 Thread Jean-Claude Wippler

Brian Kelley wrote:

Let's do the test:
2.9143433 seconds to iterate bsddb3
1.8621608 seconds to iterate metakit
So metakit is approximately 30% faster for linear access.  Both are 
pretty good though.
As you know, statistics can be made to come out any way you like, i.e. 
your above figures could also be summarized as: bsddb3 needs 56% more 
time than MK.

[jyl]
You didnt say which OS you used. Do you happen to know if Metakit uses
memory mapped files on your OS? If it does, that's why loading is 
slower
-- Metakit has to obtain committed address space pages from the OS to 
map
all that data into the process's address space.
Keep in mind that this is a one-time mmap().  Pointer access and page 
faults do the rest.  OS'es are pretty good at that, their entire 
code-loading and I/O designs are based on it, usually.  Modern OS'es 
detect sequential accesses even and start pre-fetching.

I just know from experience that bsddb scales up to gigabyte files and 
metakit claims to have good performance to the several hundred 
megabyte region.  I haven't found much of a problem in practice, some 
of my metakit files are 600MB+
The hard limit is memory mapped address space, i.e. well under 2 Gb on 
32-bit machines, in practice.

I'd like to point out that storing blobs in MK is actually very 
efficient.  Surprising at it may sound, above a certain size and number 
of rows, storing N items in a view will do just about the same as BDB 
does.  Keep in mind that MK is columnar - the presence of a column, no 
matter how big or complex, does not affect traversal of the others.  If 
it's all opaque binary data, and a substantial percentage is empty, 
then using a separate view should work out better.

Personally, I think that unless file size is too constraining, you 
should just add the extra property to your view and let MK's 
adaptiveness figure out how to store things for that particular column 
(what MK does is switch to non-columnar storage if #bytes or #rows 
grows too far - there's a heuristic involved to find a decent 
trade-off).

If you really want to get to the bottom om this, you should compare the 
mix of having data in MK for traversal, and either items in BDB for 
large storage or adding a single view with one S or B property, and 
storing items in that view.  The logic of this should be similar.  I 
suspect that MK will come out at least as fast (you're retrieving the 
N'th item from a view, vs. BDB doing an extra - albeit simple - hash 
lookup).  You'd gain single-file convenience, and less installation 
dependencies.  But you'll need to stay under say 1 to 1.5 Gb.

Note that there is a downside in the current implementation: MK 
determines free space from a full traversal, so having millions of 
pieces in the file wil slow it down as it starts preparing for a 
commit.  The file format has some unused features to greatly avoid such 
traversals, but MK 2.4.9.2 does not yet take advantage of that.  It 
will definitely have to, before we can grow it to the terabyte range in 
64-bit architectures.

-jcw

___
metakit mailing list  -  [EMAIL PROTECTED]
http://www.equi4.com/mailman/listinfo/metakit

[Metakit] compression and encryption

2003-10-10 Thread Jean-Claude Wippler

Two options not implemented in MK 2.4.9.2, are compression and 
encryption.

Due to the column-wise design of MK, this may actually have substantial 
consequences.  The idea, is that in a datafile with say layout 
names[first:S,last:S,phones[type:S,number:S]] it would be possible to 
designate some properties as being compressed, others as encrypted, and 
yet others as both.

The compression would take place in a column-wise manner, i.e. all 
values of the designated property in all rows would be compressed.  If 
there is major redundancy/repetition, then the storage size would be 
greatly reduced.  On first access, such columns would be uncompressed 
(taking up some memory), and on commit, the data would be saved 
compressed again.

For compression, it would seem that zlib is the de-facto standard to 
use.

For encryption, a similar effect would be seen: on file, the entire 
column becomes encrypted, again for a specific property of all rows in 
that view.  A complicating factor would be that encryption needs to be 
cusomizable, so in this case a callback through the c4_Strategy class 
seems the right way to do it.  Perhaps some basic encryption such as 
David Wheeler's TEA could be included as default.

When combining compression and encryption, compression would have to be 
done first, to have any effect.  The encrypted result of that would be 
stored on file.  On reading, the data must first be decrypted, then 
decompressed.

This also requires a change to somehow specify the details in the 
description string, or in some other way.  This will require more 
thought.

I'm bringing this up because I am regularly compressing data before 
storing it (and Starkits in Tcl do it all the time), and because I 
think there is value in getting the encryption covered, especially 
since it could be done on a per-property basis.  Encryption could be 
useful to lock up applications deployed as starkits/starpacks.

The file format has hooks to allow this sort of thing, although such 
files will not be readable by current MK releases (they would not know 
how to skip over the extra admin info).

Apart from yes please, or no thanks, do you consider this a valuable 
option?  Would you need it and use it right away?  Desperate enough to 
fund it?  (I had to ask...)  Any ideas about how to encrypt?  Or maybe 
only do compress?  Are there any implications / trade-offs I'm 
forgetting about?

-jcw, with a small marketing hat on...

___
metakit mailing list  -  [EMAIL PROTECTED]
http://www.equi4.com/mailman/listinfo/metakit

Re: [Metakit] compression and encryption

2003-10-16 Thread Jean-Claude Wippler

Andreas Muegge wrote:

the implementation of encryption is something I would like to see
Ok, thanks for letting me know (surprising: just two responses so far).

I am not sure about compression. If we talk about normal strings I 
guess
you must try it out. Decompression is usually rather fast and waiting 2
seconds at program start shouldn't bother the user. Would it still be
possible to run Metakit on a readonly medium?
Yes.  Decompression would be an in-memory thing.

For big data (1k and more) I have serious doubts. You would have to
decompress several MBytes before the first access is allowed [...]
No, the decompression would happen per column, on first access, and you 
can pick per column which one is stored compressed and which one isn't. 
 For large strings, compression would not be per column even, but per 
item.  The switchover point is hard to define, MK uses an adaptive 
heuristic to choose between ways to store strings.

Of course I can only compress each record and not the whole column.
Yes, per-item compression (BTW, it's not record but item, i.e. 
property) is always possible of course, at the MK caller level.

The encryption/compression I'm talking about would be column-wise, i.e. 
very effective with views in which a property has the same value across 
many rows.  For an impression of the effectiveness on your data, create 
a datafile from scratch so it has no free space, and gzip it - the 
results could be normal (i.e. 10..30% reduction) or dramatic (i.e. 90% 
reduction), depending on the nature of your data.  I once converted a 
120 Mb database stored in another format, and ended up with 10 Mb in 
MK, which them went to 900 Kb when gzipped...

-jcw

___
metakit mailing list  -  [EMAIL PROTECTED]
http://www.equi4.com/mailman/listinfo/metakit

Re: [Metakit] KitBinder

2003-10-21 Thread Jean-Claude Wippler

Pascal Baspeyras wrote:

I'm interested in KitBinder features, but I can't find
more than this page:
http://www.equi4.com/metakit/api-old/doc_kbind.html
I easily embed a metakit file into my app's resources,
but I fail to open it from there (Visual C++).
Whoa ... that's 5 year old technology, it's truly ancient!  ;)

With today's MK, all you need is to append the MK datafile to the end 
of your executable.  Then open the executable (read-only) and you'll 
have a datafile... this is cross-platform.

The trick is to find out the path of the file to open.  In Win32, use 
GetModuleFileName.

-jcw

___
metakit mailing list  -  [EMAIL PROTECTED]
http://www.equi4.com/mailman/listinfo/metakit

Re: [Metakit] Viewing a MK database

2003-10-31 Thread Jean-Claude Wippler

[EMAIL PROTECTED] wrote:

Let's say I have a large Mk database that I want to display in a 
grid-like
format (wxGrid, for example).  What is the best way to approach this so
that the display is very fast?
[...]
But these *represent* other things and what I want to display
is the string representations corresponding to (at least some of) the 
ints.
This requires accessing other databases to retrieve or compute the
appropriate string representation.  It is then this resulting 
string-ized
row that needs to be displayed in the grid.
Quick (not well thought-through) response:

This is where mk.wrap() can probably help.  Define a view which wraps 
the MK view, and produces values in the way you need them.  Then, 
access to items will go through your Python wrapper *on-demand*.

The whole trick of KitView.exe (and I presume Brian's KitViewer) is 
delayed rendering.  Scrolling across an infinite number of rows can 
be instant.  The issue is not doing more work than needed - and if you 
think about it, KitView does exactly the same as what you're after: 
take data out of MK, and present it in transformed way on the screen 
(namely visually).

Back in the times when KitView.exe was built, I had only Borland C++ 
Builder's datagrid which was up to this task.  Many simpler GUI 
approaches ask you to fill a matrix or listbox before using them.  
Nowadays, there are several widgets in Tk (TkTable as well as pure tcl 
ones such as Hugelist) which can play this on-demand delayed-rendering 
trick.  I don't know wxPython (or wxWindows) unfortunately - but it can 
no doubt also do this.

With Mk4py's wrap() you can create MK views which are virtual *and* 
which go through arbitrary Python code on each item access request.  
It's sort of the equivalent of MK's c4_CustomViewer class in C++, which 
is extremely powerful (many view operators are built on it).

-jcw

___
metakit mailing list  -  [EMAIL PROTECTED]
http://www.equi4.com/mailman/listinfo/metakit

Re: [Metakit] Question on how to remove top level views

2003-11-11 Thread Jean-Claude Wippler

Berk, Murat wrote:

We used to store a view name in one view (instead of marking cell as 
subview) so that we can do getAs(name) and use it.

I do not want to change a lot of things, but when we try to delete the 
rows in the first view, the only thing we can do is to remove all 
elements of the second view but I cannot really delete it.

view1
 name  field1 field2 field3
  name1  data...
  name2  data...
__name1__
  prop1 prop2 prop3
__name2__
  prop1 prop2 prop3
When we delete name1 from the first view, I want to remove the whole 
view
called __name1__. How can I do this..
storage.GetAs(__name1__)

IOW, omit the usual [...] part.  Am I missing something?

Since everything is a view including database itself, in theory this 
should be possible.
Yes.

The one confusing part could be that the view won't go away until 
committed.  All properties, including subviews, including therefore 
also top-level views, stay around afetr a restructure, and only truly 
vanish on commit (that makes it possible to re-structure, copy over, 
then commit).

-jcw

___
metakit mailing list  -  [EMAIL PROTECTED]
http://www.equi4.com/mailman/listinfo/metakit

Re: [Metakit] unique() with count?

2003-11-16 Thread Jean-Claude Wippler

[EMAIL PROTECTED] wrote:

It turns out that the unique() view operation is pretty useful to me.
However, what would be even more useful is to have a count of each 
record
which indicates the total number of records which were in its identity
class.
apply(view.counts, view.structure() + ['frequency']) ?

-jcw

___
metakit mailing list  -  [EMAIL PROTECTED]
http://www.equi4.com/mailman/listinfo/metakit

Re: [Metakit] Passing views in Python

2003-11-20 Thread Jean-Claude Wippler

Brian Kelley wrote:

My guess is the storage is going out of scope and being garbage 
collected and thereby closing your views without you knowing about it. 
 You should keep the storage object in scope for your entire 
application run.

Here is some proof of this:

 storage = metakit.storage()
 vw = storage.getas(test[I:I])
 vw.append(1)
0
 del storage
 len(vw)
1
 vw[0].I
Traceback (most recent call last):
 File stdin, line 1, in ?
AttributeError: I

I'm of two minds whether I think this is a bug or not.  I think it 
might be onerous for the python wrapper to keep track of all the 
views/subviews and like being created and used.
Spot on, I think.

When a storage goes out of scope, all data becomes unavailable.  Keep 
in mind that MK uses memory-mapped files usually, so access to data 
actually goes *straight* to disk in most cases.  Closing the file 
blocks off that access.

But MK does not control where view objects are used, it merely tracks 
reference counts (in the same way as Python does for its own 
PyObject's).  The C++ view objects themselves are not tracked or 
reachable from the storage object.

So at some point, the only way out I could think of is to make views 
act as being empty.  All rows continue to exist, they just don't have 
any properties anymore.  Which does indeed make them pretty useless, 
but at least it leads to well-defined semantics.

Views which do come from a storage are unattached views, these are 
therefore autonomous.  That's what you get when you use copies.  The 
flip side is that they eat up memory, and have to be allocated and 
copied in full.

The way to avoid problems in Python, is to store the storage object in 
a variable which is guaranteed to stay around as long as its data needs 
to be accessed.

-jcw

___
metakit mailing list  -  [EMAIL PROTECTED]
http://www.equi4.com/mailman/listinfo/metakit

Re: [Metakit] Re-assigning a c4_View - a quickie

2003-11-20 Thread Jean-Claude Wippler

Ian Fairclough wrote:

Just a quick question, if you have the following code :

c4_View viewB = viewA.Duplicate();

and then you want to re-assign viewB i.e.

viewB = viewC.Duplicate();

What should you call prior to re-assigning viewB to ensure that the 
first
viewB is properly destroyed.  For example, would the following do it :

viewB = viewA.Duplicate();
viewB.RemoveAll ();
viewB = viewC.Duplicate();
There are two types of destruction.  If the view is attached to a 
storage, the RemoveAll() will make sure all its rows are deleted (on 
file too, after commit).

If all you care about is memory use and object clean-up, then you don't 
have to do anything.  MK uses a technique called smart pointers in 
C++, which automatically manages all reference counts.  The line
	viewB = viewC.Duplicate();
does a number of things:
	- it creates a new view with copies of what is in viewC
	- it increments the reference count of that new view
	- it decrements the refcount of whatever was in viewB
	- it makes viewB refer to the newly created copy

If viewB previously referred to a copy of viewA, and if no other view 
object refers to it, then the decrement will cause that copy to be 
cleaned up.

This is fully automatic in C++, as long as you stay away from pointers 
to c4_Views.  There is no need whatsoever for these (same for 
c4_Storage, btw), not even performance-wise because c4_View objects are 
very very lightweight objects.  Enjoy the magic of smart pointers!

-jcw

___
metakit mailing list  -  [EMAIL PROTECTED]
http://www.equi4.com/mailman/listinfo/metakit

Re: [Metakit] 'Blocked' views

2003-11-26 Thread Jean-Claude Wippler

Jacob Levy wrote:

Thanks for the example -- I'm sure I can construct the equivalent C++
code, and if not I'll look through the tests that I'm sure contain some
examples.
Look in examples/.  It's all there, C++ and Tcl.

You mentioned that blocked views are advantageous for when you have 
lots
of small strings. The advantages are better reuse of space and more
compact storage? What other circumstances would benefit from using 
blocked
views? Is there a (significant) performance penalty using blocked 
views?
I'm not going to go into this - your best bet is to measure each case 
yourself.  Look in examples - it has sample code, timing tests, 
scalability tests, etc.

Look also at the link at the bottom of the MK docs page:
	http://www.equi4.com/mkdocs.html
I forgot to mention that before.  While MK may not have stellar 
documentation, we should at least try to make good use of what there 
is... right?

Oh, and look in examples/ in the MK source distribution :)

-jcw

_
Metakit mailing list  -  [EMAIL PROTECTED]
http://www.equi4.com/mailman/listinfo/metakit

Re: [Metakit] License

2003-12-04 Thread Jean-Claude Wippler

Pat Knight wrote:

The Metakit license says I have to include the copyright notice and 
license text if my product contains substantial parts of the 
software. However, the precompiled DLLs for Windows don't contain the 
required text. Am I allowed to redistribute them, or do I have to 
build my own versions incorporating the text?
Yes - feel free to redistribute them.  To comply with the 
copyright/license, you can include the standard blurb in accompanying 
documentation.  Perhaps also include a link to the MK homepage or 
license page.  That way the origin of the software is clear - which is 
what the MIT license is all about (well, for me anyway).

-jcw

_
Metakit mailing list  -  [EMAIL PROTECTED]
http://www.equi4.com/mailman/listinfo/metakit

Re: [Metakit] MKStats

2003-12-20 Thread Jean-Claude Wippler

Jeffrey Kay wrote:

Is the source code for the mku portion of the mkstats program 
available?  I
thought that having the ability to compute the percentage of empty 
space in
a db would be a helpful function to have in my code, specifically so I 
could
decide when to compact the data.  It appears that the c4_Strategy 
class has
a FileSize() function, but that doesn't return the amount of bytes 
actually
used in a file.  How would I compute that value?
Sorry, I'm afraid not.  Mkstats uses some new code which is part of a 
larger project.  The mku utility is not based on the Metakit C++ 
core, and does a complete traversal on its own of the data on disk to 
generate the usage info.  You can call it as an executable, and parse 
the output, as mkstats does.

-jcw

_
Metakit mailing list  -  [EMAIL PROTECTED]
http://www.equi4.com/mailman/listinfo/metakit

[Metakit] Metakit list at gmane.org

2003-12-20 Thread Jean-Claude Wippler

If you prefer to read this mailing list over the web, then you may want 
to check out the new archive at 
http://news.gmane.org/gmane.comp.db.metakit/ - it's quite sophisticated 
in its support of keyboard navigation (for javascript-capable 
browsers).  Click on the question mark in the top right corner for 
details.  There's also an NNTP interface.

Thanks to Lars Magne Ingebrigtsen - Mr. Gmane, for making such a 
wonderful resource available, and for importing the entire Metakit 
mailing list archive.

-jcw

PS.  FYI, Starkit list has also been on Gmane for some time now:
http://news.gmane.org/gmane.comp.lang.tcl.starkit/
_
Metakit mailing list  -  [EMAIL PROTECTED]
http://www.equi4.com/mailman/listinfo/metakit

Re: [Metakit] metakit and UTF-8

2003-12-23 Thread Jean-Claude Wippler

andrian wrote:

I created a metakit database using Mk4tcl 2.4.9.2.
I have saved the script file with the data to populate
the db in UTF-8. However, the stored data appear to
be corrupted.
I understand that according to Metakit's specification
UTF-8 is supported. Furthermore, I use Wikit, where I
have, succesfully, stored UTF-8 data.
What can be the cause of this problem?
UTF-8 can definitely be stored in MK (sorting is another matter).

Can you create a test script which shows the problem?  Without it, I 
have no way of helping or even reproducing the problem, I'm afraid.  
FWIW, I have not heard of a problem with UTF-8 before.

If you use Mk4tcl, then you may want to consider posting to the Starkit 
mailing list, which has many more Tcl susbcribers than this list:
	http://www.equi4.com/mailman/listinfo/starkit
Though I read and respond to both, of course :)

-jcw

_
Metakit mailing list  -  [EMAIL PROTECTED]
http://www.equi4.com/mailman/listinfo/metakit

Re: [Metakit] Closing storages (again)

2004-01-02 Thread Jean-Claude Wippler

[EMAIL PROTECTED] wrote:

So now my question is this:  Are weak references to PyStorage
objects unsupported simply because the necessary stuff to
support them was just never added?  (It doesn't *look* as
though it would take much to support this.)  Or is there some
more fundamental reason weak references to PyStorage objects
can't be supported?
Mk4py can still be compiled with Python 1.52. This has proved
valuable to a number of people (myself included) whose hosting
providers subscribe to the if it aint broke don't fix it philosophy.
I don't know much about Python's weak references (simply because they 
were introduced after I was involved with Mk4py).  I'd be willing to 
maintain a dual code-base, provided all differences are dealt with 
(one, I hope) #define's.

So on my end, it's more a lack-of-time-not-high-enough-priority kind of 
issue than anything else.

On a different, but related, note: I've been making good progress on 
integrating Nicholas Riley's changes to Mk4py.  It now seems to be ok, 
other than that setup.py appears to be hitting distutils buglets with 
Python 2.2.3 (current default on my Gentoo Linux setup).  For that 
combination, the answer will have to be: use make.  Anyone using Mk4py: 
if you could download and verify the latest sources from CVS, that 
would be a big help - it's been holding up a new update of MK way too 
long already...

-jcw

_
Metakit mailing list  -  [EMAIL PROTECTED]
http://www.equi4.com/mailman/listinfo/metakit

Re: [Metakit] Question

2004-01-02 Thread Jean-Claude Wippler

chris mollis wrote:

I have a question about the best way to validate information on 
reads/writes to the db.  For example, I'd like to make sure that data 
that is written out during a particular commit (by calculating a hash 
of data written, perhaps) can be verified again when the database is 
re-opened at a later date (possibly calculating the hash again and 
then checking this against the previous hash).  What do you recommend 
to be the best way to do something like this?  Should I override 
DataWrite/DataRead methods of c4_FileStrategy to calculate hashes on 
read and write operations?
Good questions.  There are several aspects to consider.  The first one 
is really what sort of validation you are after: if you need to verify 
storage in general, then one could argue that there really is no other 
option than full file checksums, and even then it'll depend on the sort 
of validation as to when and how often you need to do it.  Such 
checksums could be done outside MK, i.e. after commits and before 
opens.

Another point to be made is that MK is a database: it does not read or 
write all data each time the datafile is used.  By validating writes on 
commit, you'll be checking only what it changed, not the entire 
datafile.  Due to the way data is stored, the data written can be all 
over the datafile, it's not necessarily contiguous (though individual 
columns are).

It gets worse: MK usually loads data by mapping a file into memory.  
That means no read system calls take place at all in most cases: the 
data is mapped to a range of addresses and paged in via O/S page faults 
when accessed, which is a matter of following pointers.

If you really insist on doing this in some sort of fine-grained manner, 
my suggestion would be to use a custom c4_Strategy class as you mention 
yourself, in combination with a *second* MK datafile.  The invariant is 
that MK always writes entire columns - I suspect that it is possible to 
detect the column boundaries written by intercepting DataWrite().  The 
main call comes from column.cpp line 1532.  Or it may be necessary to 
introduce two extra strategy members which get called once in each call 
to c4_Column::SaveNow():
	- strategy_.DataInit(pos_)
	- unmodified while (iter...) loop
	- strategy_.DataDone(_size)
The DataInit would reset a checksum field in the strategy object (and 
remember pos_), the DataWrite calls would incrementally update the 
checksum, and the DataDone call would save a pos,size,check triple in 
the second MK datafile.  It'll take some extra logic to make this work 
across multiple commits, i.e. when space gets re-used, but that ought 
to be doable.  You may want to use hashed views for the secondary MK 
file, to make it snappy.

The most important problem to deal with is *when* to verify such saved 
checksums.  If it has to be done during access, then I can't think of 
any other way than to disable memory mapping (by overriding 
c4_Strategy::ResetFileMapping with a dummy which does nothing).  That 
makes MK slower and makes it use considerably more temp memory, however 
- so you'll have to think hard whether that is really what you want.

If you just want to checksum occasionally, then you could iterate 
through all triples in the secondary MK file and verify each of the 
ranges.

Another idea would be to save checksums per fixed-size block, say 4 Kb. 
 That means DataWrite would track checksums, but it may need to read 
some data of the disk to deal with writes which are not exactly on 
block boundaries.  This needs some thought to optimize, since most 
DataWrite calls will not be aligned nicely.  Then again, DataWrite does 
get called in mostly sequential order, since it writes entire columns 
most of the time.

-jcw

_
Metakit mailing list  -  [EMAIL PROTECTED]
http://www.equi4.com/mailman/listinfo/metakit

Re: [Metakit] Potentially Stupid Question

2004-01-06 Thread Jean-Claude Wippler

Brian Kelley wrote:

Based on some previous posts to the metakit news group, I have learned 
that a metakit storage can be about 1.5 Gigabytes in storage before 
performance starts to decline.  I.e. memory mapped access is no longer 
viable.
Good to know.

What happens if you have two storages open?  Can each be 1.5 Gigabytes 
or does memory mapping not really scale this way.
Nope - address space is a per-process limitation.  The real way out is 
64-bit address space machines.  You may be able to squeeze some slack 
with redundancy reduction, compression, etc - but it'll probably be 
hard and may not even offer much payback.   If you have big data items, 
you can put them on file, seek/read as needed, and manage space in MK - 
but that too will take some work.

-jcw

PS.  I disagree with the title!

_
Metakit mailing list  -  [EMAIL PROTECTED]
http://www.equi4.com/mailman/listinfo/metakit

[Metakit] Metakit 2.4.9.3

2004-01-26 Thread Jean-Claude Wippler

This is to announce a new release of the Metakit embedded 
high-performance database library for C++, Python, and Tcl.

This version consolidates bug fixes over the past 9 months since 
2.4.9.2 came out.

There should be no source code or binary incompatibilities, upgrading 
is recommended (but not urgent).  An extract of the change log is 
appended, full details are available at:
	http://www.equi4.com/pub/mk/metakit-2.4.9.3.kit/CHANGES

For details see the Metakit homepage at:
http://www.equi4.com/metakit.html
Sources and C++/Python/Tcl binaries for Windows, Mac OS X, and Linux 
are here:
	http://www.equi4.com/pub/mk/

Enjoy,
Jean-Claude


2004-01-26MK 2.4.9.3
2004-01-22Fixed refcount problem with temp rows in Mk4tcl
2004-01-21Documentation updates
2004-01-20Don't trip over duplicate property names
2004-01-18Fixed rare but very serious subview resizing bug
2004-01-16Gracefully deal with bad property type specifiers
2004-01-03Fixed typo in PyView.cpp
2003-12-21Fixed Mk4too sorting on subview of length 1
2003-12-13Tweak to avoid two unisgned/signed compiler warnings
2003-12-11Checked in numerous changes to Mk4py by Nicholas Riley
2003-11-23Bumped to Python 2.3, doc tweaks, lots of name fixes
2003-10-28Get rid of --enable-python, check in c22.txt
2003-10-16Added note to Tcl docs
2003-10-10Added c22 test
2003-10-01Fixed bugs in Tcl test suite
2003-09-30Python 2.3.1  cleanup
2003-09-20Autoconf and libtool rebuilds
2003-08-26Documentation fix
2003-07-17Fixes to Mk4py (Gordon)
2003-07-11Fix for Linux not finding .lai file
2003-07-01Fixed Metakit (preferred) vs Metakit (obsolete)
2003-06-06Fix to Mk4py for case (in)sensitivity.
2003-05-15Add distutils setup.py script (Gordon).
2003-05-08Fixed array bound bug when not using mmap-ed files
2003-04-28Sourceforge
2003-04-25Autoconf/libtool update
2003-04-22Fixes to Mk4py (Gordon).
2003-03-16MK 2.4.9.2
_
Metakit mailing list  -  [EMAIL PROTECTED]
http://www.equi4.com/mailman/listinfo/metakit

[Metakit] multi-column sorting

2004-02-09 Thread Jean-Claude Wippler

Does anyone know how to do a multi-column sort, using column-wise 
permutations?

I'm looking at ways to optimize sorting, based on the fact that MK has 
a column-wise data organization.  The current sort does row-wise 
comparisons.  Here's what I'm after:

* take the first column, sort it, and produce a permutation vector for 
it
* take the second column, sort it, and ...
* etc...

So you end up with N permutations vectors.  Each sorts only on the 
specified column.  Assume that the sorts are stable, i.e. identical 
entries are kep in input order.  I'm looking for a way to combine these 
permutations so that the result is a permutation which represents full 
sort order of the entire view.  Tried some ideas, but none of them seem 
to be right.

I've googled on the web, but can't find much relevant info (or don't 
grasp the theoretical foundations enough to spot the essential tricks). 
 It would seem related to radix sorting.

D'you know what's involved or have tips on what terms to look for?

-jcw

_
Metakit mailing list  -  [EMAIL PROTECTED]
http://www.equi4.com/mailman/listinfo/metakit

Re: [Metakit] multi-column sorting

2004-02-09 Thread Jean-Claude Wippler

Brian Kelley wrote:

Jean-Claude Wippler wrote:

Does anyone know how to do a multi-column sort, using column-wise 
permutations?
Is this the right approach? Thinking out loud here.

A multi-column sort is really a precedence sort.  You only need to 
sort on a secondary or tertiary key if the primary key has equivalent 
values.

1) sort on next property
2) if any more properties groupby property else goto 4
3) foreach groupby subview go to 1
4) reassemble final indices
Result - stable sort, I think :)
Yes, this is clear - and very much related, but quite what I mean.

I'm looking for ways to use more efficient algorithms underneath MK, 
ie. as basis to do the above.  I'm also looking for ways to do things 
lazily - i.e. defer some of the computations.  This could have 
considerable implications when you sort a view and then as for a slice 
of it, i.e. only display a small section of it.

I have a half-baked python implementation that requires an index 
column (mainly because the groupby method doesn't keep track of the 
row index)
The way this can be done is to add an extra column with row indices 
(sort of like APL's iota) using the pair() operation, and then group. 
 That way the result will carry original row indices with it.

There are more such tricks waiting to be found  exposed.  I'm 
currently trying to better understand what sort of core functions are 
needed to build the rest with.  Hence the Q about per-column sorting 
and trying to find a way to combine permutations.

To give an example - to sort on col 2, 4 reverse, and then 3 could be 
done using something like this:
	m2 = sortmap(col[2])
	m3 = sortmap(col[3])
	m4 = sortmap(col[4])
	result = view.remap(m3.remap(reverse(m4)).remap(m2))
(with partial use, i.e. when fetching only a slice of the result, all 
sorts of neat tricks can be added, leading to behavior which I think 
resembles what you are describing above)

Except that the above permutation stacking is not exactly right... 
alas.

-jcw

_
Metakit mailing list  -  [EMAIL PROTECTED]
http://www.equi4.com/mailman/listinfo/metakit

Re: [Metakit] Metakit wiki

2004-03-03 Thread Jean-Claude Wippler

John Fletcher wrote:

I cannot find on the metakit home page at 
http://www.equi4.com/metakit.html
any link to the metakit wiki at
http://www.equi4.com/metakit/wiki.cgi/0
It's still there, at http://www.equi4.com/mkmailing.html

I wondered if it was still there, and it is.  For some purposes the 
Email list is better, but for things which develop over a period of 
time the wiki can be a useful reference.
I've been switching to ProjectForum as wiki for some other projects, 
such as
	http://www.equi4.com/forum/rawiki/Home
It offers more protection/authentification options, RSS feeds, CSS 
themes, file attachments, and a lot more (and yes, it's all powered by 
Metakit).  Courtesy of Mark Roseman.

Have been pondering for quite a while whether it would be feasible to 
migrate the MK wiki pages to PF, and revitalize things a bit.  I agree 
that there is value in having information categorized, not just stored 
in a timeline, as it is now:
	http://news.gmane.org/gmane.comp.db.metakit/
	http://www.equi4.com/pipermail/metakit/

If there is sufficient interest to help fill a new area, I'd be happy 
to set it up and participate in getting (some) old info transferred and 
a setting up a good structure, perhaps with areas per language.

-jcw

_
Metakit mailing list  -  [EMAIL PROTECTED]
http://www.equi4.com/mailman/listinfo/metakit

Re: [Metakit] E4graph link

2004-03-03 Thread Jean-Claude Wippler

John Fletcher wrote:

The e4graph link on page http://www.equi4.com/mklinks.html
should be changed to
http://e4graph.sourceforge.net/
Done, thx.

-jcw

_
Metakit mailing list  -  [EMAIL PROTECTED]
http://www.equi4.com/mailman/listinfo/metakit

Re: [Metakit] data structure question

2004-03-03 Thread Jean-Claude Wippler

Jerry wrote:

  I have a data structure (that works well so far) with three similar
  sub-views that are accessed, set, and summarized at different
  points.  Now I have a requirement to output a summary of all the
  detail with a label that identifies with of the three sets the data
  came from.  The solution I came up with doesn't work, so I am
  thinking out loud to see if someone has an idea that involves the
  least amount of re-coding.  The sub-views are accessed quite a bit
  in normal processing, and the summary only needs to be created once
  or twice a month.
  Simplized data format:
vw =
   
db.getas('main[id:I,fname:S,lname:S,new[key:S,val:I],old[key:S,val: 
I],adj[key:S,val:I]')
jdb.dump(vw)
 idfname   lname  new old adjs
   --  -  --  --  --
1  first1  last1  0 rows  0 rows  0 rows
2  bob last2  2 rows  0 rows  0 rows
4  first4  last4  2 rows  0 rows  4 rows
  Using a combination of flatten, union, and project I can get REAL
  CLOSE to what I want:
jdb.dump(vw31)
 idkeyval
   -  ---
2  val10   10
2  val20   20
4  val10   10
4  val10   10
4  val20   20
4  val20   20
4  val30   30
4  val40   40
 What I need is to know WHICH type of value each row is:
 idkeyval  type
   -  ---  
2  val10   10  new
2  val20   20  new
4  val10   10  new
4  val10   10  adj
4  val20   20  new
4  val20   20  adj
4  val30   30  adj
4  val40   40  adj
 The main problem being that one cannot successfully add a property to  
a
 PyROViewer object, which is the result of the union and flatten
 methods.  It seems I either have to:

 1) create a separate temporary view for each type and manually copy
the flattened view into it, creating and setting 'type'
appropriately.  Then union off of these PyView objects.
 2) or modify my system to always write the subview type into the
subview.  This means extra programming and run-time overhead, and
extra strings in tens of thousands of records.
 3) or some other creative idea you suggest.
This is exactly the sort of manipulation I hope to improve further,  
btw.  It's becoming more and more clear that MK needs to offer full  
relational algebra + set operators.

First of all, note that you can add properties on the fly to views,  
they will stay around until commit/rollback/close (but won't be saved  
if they are not part of a getas).

So you could do:
	- for each row in new subview:
		set type property to new
	- etc for other subviews
Then you'd be able to see ... ah, wait - I see your point now: Mk4py  
tracks R/O view status and forbids this (even though the C++ core would  
allow it).

Ok, another idea: create a view with N copies of the string new (can  
be done virtually with wrap() in Mk4py).  Use pair() to add that view  
next to the subview, i.e. horizontal concatenation.  Does that take  
you closer to a solution?

-jcw

_
Metakit mailing list  -  [EMAIL PROTECTED]
http://www.equi4.com/mailman/listinfo/metakit

Re: [Metakit] view() vs. getas() in Python

2004-03-03 Thread Jean-Claude Wippler

[EMAIL PROTECTED] wrote:

Supposedly view() is The normal way to retrieve an existing view.  
But
apparently even if the view doesn't exist, something gets retrieved --
though it isn't of much use.  However, it does appear that you can 
append
to this non-existent view since

   v = db.view('NonExistentView')
   v.append(foo=1)
*appears* to work.  But then, of course any attempt at referencing 
v[0].foo
fails (though referencing v[0] does not).  Shouldn't appending to a
non-existent view raise an exception?  If you look at db.properties() 
there
again *appears* to be a view called 'NonExistentView' there.

How can I tell that I've retrieved a non-existent view as opposed to,
say, a merely empty one?  If I try to use description() on a 
non-existent
view I get a really ugly Python internal error.

Given this, what is the point of view()?  (No pun intended.)
It's historical: getas used to be very expensive.  And there used to be 
storeas.  Nowadays, like you I tend to use getas all the time.  The 
app essentially says: get me a view of such-and-such shape.  Just do 
it, make it that shape if need be.  Extremely handy for adding 
properties over time.

The future of this is going to be different still, btw.  The plan is to 
treat view structure as a meta view itself.  So you'll have a view, 
where each row describes a column.

I started on that in the current MK design, but it really goes much 
deeper and benefits from a fundamental switch to this approach in the 
core.  That will go as far as making a row add in the meta view be 
equivalent to defining a new column, and so on for renames and deletes.

But why not raise an exception if a bogus view name is given to view()?
Good point.  I think it would indeed help avoid time-wasting surprises.

-jcw

_
Metakit mailing list  -  [EMAIL PROTECTED]
http://www.equi4.com/mailman/listinfo/metakit

[Metakit] libtool or not?

2004-03-03 Thread Jean-Claude Wippler

If you are building Metakit on anything but the usual quad or so of 
most common platforms, any of C++, Python, or Tcl - could you please 
help decide what to do?



YES OR NO: get rid of libtool in the MK build process?

WHY: less fighting, drop dependency on libtool, which has changed over 
the years.

WHY NOT: may require some work to build for special platforms (AIX? 
HPUX?)

HOW: switch to gcc -shared, with a few refinements to make it work on 
Mac OS X and such.  These refinements can be added to the 
unix/configure.in logic, autoconf has sufficient capability to cover 
most cases, I think.

WINDOWS: no change, when built with MSVC 6 (I just checked in a MSVC 
7.0 version, btw).  No change with mingw either, since -shared does 
the right thing nowadays.

TCL: probably not affected, it has its own configure logic.

PYTHON: probably not affected, it is moving to distutils.



Your votes and opinions please...

-jcw

PS.  In fact, I'd love to throw out all of make and autoconf, if I knew 
how to create an effective distro without them (Python is furthest 
along in that area, clearly, with its distutils).  Make is a brilliant 
concept, but even that makes little sense when it's about deploying and 
compiling a tested distribution - once.  IMO the only strong case for 
autoconf + make nowadays, is that everyone in OSS-land is used to the 
configure; make; make install salute.

_
Metakit mailing list  -  [EMAIL PROTECTED]
http://www.equi4.com/mailman/listinfo/metakit

Re: [Metakit] Metakit auto** problem

2004-03-03 Thread Jean-Claude Wippler

David McNab wrote:

Nicholas Riley wrote:
Try using the distutils build method instead of ./configure
--enable-python - it'll work back to 1.5.2 if you use the latest
distutils (which is also guaranteed to work back to 1.5.2).
Tried that.

With metakit's setup.py, distutils doesn't work for pythons earlier 
than 2.3.
Is there something simple we can do to fix it?

-jcw

_
Metakit mailing list  -  [EMAIL PROTECTED]
http://www.equi4.com/mailman/listinfo/metakit

[Metakit] libtool has been removed

2004-03-07 Thread Jean-Claude Wippler

I've removed libtool from the Metakit build setup and checked in 
changes to CVS.

The changes are very preliminary - this build is likely to work on less 
platforms than before.  Use the 2.4.9.3 distribution if you are not 
prepared to deal with this.

I'll be adjusting this further in the coming weeks based on feedback 
and through tests on the platforms I use myself, and am soliciting 
patches  suggestions on how to further improve things.  
Simplifications would be even better, especially drastic ones!

-jcw

_
Metakit mailing list  -  [EMAIL PROTECTED]
http://www.equi4.com/mailman/listinfo/metakit

Re: [Metakit] problems with hash view in 2.4.9.3

2004-03-09 Thread Jean-Claude Wippler

Brian Kelley wrote:

I don't know if this helps, but the error seems to be dependent on the 
column name 'email_sender'  This is pretty weird...

s=metakit.storage()
fails = t[url_hash:I,email_sender:S]
works = t[url_hash:I,pizza_sender:S]
if 1:
   struc = fails
	field = 'email_sender'
else:
   struc = works
	field = 'pizza_sender'
v=s.getas(struc)
hv=s.getas('hv[_H:I,_R:I]')
v=v.hash(hv,1)
new_vals={'url_hash':1,
 'email_sender':'A',
		 field:'A',
 't':'A',}
print v.append(new_vals)
print v.append(new_vals)
print v.append(new_vals)
print v.append(new_vals)
print len(v)
metakit.dump(v)
Uh, oh.  Dict is used as sequence.  Key order changes.

To see it, add:

for i in new_vals: print i

-jcw

_
Metakit mailing list  -  [EMAIL PROTECTED]
http://www.equi4.com/mailman/listinfo/metakit

[Metakit] new Tcl build structure

2004-03-09 Thread Jean-Claude Wippler

While repairing the damage caused by removing libtool from the MK build 
process, I came up with what I think is a better way to deal with all 
the language bindings of Metakit.

Have started implementing it for Tcl / Mk4tcl.  The basic idea is to 
first build the core library in the builds/ directory, as before, 
possibly followed by running the regression test suite.  Once that 
build is done, *leave* the object files there, and go to the respective 
language area to finish the job by building the extension there.  This 
will re-use the object code generated from the initial core build, and 
link it all into the extension.

Reasons for this unusual approach:
 - the core gets built first, and can be independently verified
 - extensions can adopt whatever the norm is for that language
 - no need to bring all the C++ config.h logic into extension builds
 - the result does not need a MK shared lib, since it includes it
I've just checked new files into the tcl/ area of MK's CVS.  It uses 
Tcl's standard TEA and is derived from Tcl's sample extension.  The 
benefits so far is that the extension config logic is truly simple, all 
it does is link in a bunch of extra .o files from ${srcdir}/../builds/.

Had to put CC=g++ into the environment to make TEA work with C++.  
Also had to force using autoconf = 2.5 on Gentoo (with 
WANT_AUTOCONF_2_5=1, yuck).  The result is a shared lib called 
libMk4tcl2.4.9.3.so, and conveniences such as installing in the right 
place and with a suitably constructed pkgIndex.tcl file.  The basic 
build logic should be:
	cd builds
	../unix/configure
	make
	mkdir tcl
	cd tcl
	../../tcl/configure --with-tcl=...
	make

I hope to help do the same for Python / Mk4py and distutils.

The one issue this approach introduces, is that the core library must 
be built first - with the same settings as the extension (shared vs. 
static, debug vs. non-debug, etc).  It'll take a while to get these 
combinations right, and to document the new approach.

The base configure scripts have not changed yet, but I think the 
libtool removal broke all scripting language bindings anyway.  If you 
can't be bothered with any of this, use the 2.4.9.3 source distribution 
for now.

-jcw

_
Metakit mailing list  -  [EMAIL PROTECTED]
http://www.equi4.com/mailman/listinfo/metakit

[Metakit] Fwd: [Starkit] Mk4tcl - SegFault when using cursors :-(

2004-03-18 Thread Jean-Claude Wippler

Begin forwarded message:

From: Jean-Claude Wippler [EMAIL PROTECTED]
Date: March 18, 2004 23:54:31 CET
To: starkit list server [EMAIL PROTECTED]
Subject: Re: [Starkit] Mk4tcl - SegFault when using cursors :-(
Christoph Drube wrote:

I have massive problems using MetaKit with Tcl (ActiveTcl 8.4.5).
Sorry to hear that.

# Searching property by property
# Q: Is there a better way to search ?
set hits {}
foreach i $proplist \
{
  set hits [concat $hits [mk::select $v -first 2 -glob $i $s]]
}
mk::select  $v -first 2 -glob $proplist $s

Well, when swapping the rows, this script crashes with seg fault.
I had a look at the row contents and the search results - all is 
fine, but
after the second or third iteration over nr it always crashes :-/

What I'm doing wrong? Have I misunderstood the mk::cursor command or
their use?
Make sure you use Mk4tcl 2.4.9.3 - from the change log:

2004-01-22Fixed refcount problem with temp rows in Mk4tcl

This was a long-standing bug: mk::row create did not work right
because the tracking of temporary rows was completely messed up.
Added test case for Tcl (mk6.8), fixes FB14, BTS#78, and BTS#29.
It really drives me to despair cause it's not the first seg fault 
with mk4tcl
- isn't it possible to use property names with blanks?
I'm not sure.  I always avoid blanks in property names.  Tcl has no 
restrictions, but identifiers in C++ and Python are limited in the 
same way.

Christoph (frustrated) :-(
Ouch.

-jcw

_
Starkit mailing list  -  [EMAIL PROTECTED]
http://www.equi4.com/mailman/listinfo/starkit
_
Metakit mailing list  -  [EMAIL PROTECTED]
http://www.equi4.com/mailman/listinfo/metakit

Re: [Metakit] Re: Maybe problems with Metakit 2.4.9.3

2004-04-24 Thread Jean-Claude Wippler

Yasushi Iwata wrote:

I found another problem. Following code dose not work as expected.
[...]
But if you remove ordered(2) from getas(), it works as expected. I
also removed ordered(2) from example code that I posted yesterday, it
worked fine. There must be something wrong with ordered().
Thanks for diagnosing this.

Yes, I suspect ordered() has troubles - perhaps it's with more than 1 
key field.  There are some complex interactions between the view model 
of indexed access, i.e. control over where things go, and ordered - 
which tries to decide on its own where to put things (and hash() has no 
such issues, since it maintains order in MK).

-jcw

_
Metakit mailing list  -  [EMAIL PROTECTED]
http://www.equi4.com/mailman/listinfo/metakit

[Metakit] Mk4py build test Q

2004-04-24 Thread Jean-Claude Wippler

I have a question for Python experts, w.r.t. distutils:

I'd like to try and get setup.py working on its own.  Here's what I get 
right now (cvs HEAD, build dir wiped):

$ python setup.py build
running build
running build_py
creating ../builds/lib.linux-i686-2.3
copying metakit.py - ../builds/lib.linux-i686-2.3
running build_ext
running config
gcc -E -I/usr/include/python2.3 -o _configtest.i _configtest.c
removing: _configtest.c _configtest.i
building 'Mk4py' extension
creating ../builds/temp.linux-i686-2.3
creating ../builds/temp.linux-i686-2.3/scxx
g++ -fno-strict-aliasing -DNDEBUG -fPIC -DHAVE_UNICODEOBJECT_H=1 -Iscxx 
-I../include -I/usr/include/python2.3 -c PyView.cpp -o 
../builds/temp.linux-i686-2.3/PyView.o
[...]
g++ -pthread -shared ../builds/temp.linux-i686-2.3/PyProperty.o 
../builds/temp.linux-i686-2.3/PyRowRef.o 
../builds/temp.linux-i686-2.3/PyStorage.o 
../builds/temp.linux-i686-2.3/PyView.o 
../builds/temp.linux-i686-2.3/scxx/PWOImp.o ../builds/column.o 
../builds/custom.o ../builds/derived.o ../builds/fileio.o 
../builds/field.o ../builds/format.o ../builds/handler.o 
../builds/persist.o ../builds/remap.o ../builds/std.o ../builds/store.o 
../builds/string.o ../builds/table.o ../builds/univ.o ../builds/view.o 
../builds/viewx.o -lstdc++ -o ../builds/lib.linux-i686-2.3/Mk4py.so
g++: ../builds/column.o: No such file or directory
[...]
g++: ../builds/viewx.o: No such file or directory
error: command 'g++' failed with exit status 1
$

Is there a simple way to resolve this?  The workaround is to first do:
cd ../builds; ../unix/configure; make
The other issue I ran into is testing:

$ python setup.py test
running test
running build
running build_py
running build_ext
running config
gcc -E -I/usr/include/python2.3 -o _configtest.i _configtest.c
removing: _configtest.c _configtest.i
Traceback (most recent call last):
  File setup.py, line 184, in ?
extra_objects=mkobjs,
  File /usr/lib/python2.3/distutils/core.py, line 149, in setup
dist.run_commands()
  File /usr/lib/python2.3/distutils/dist.py, line 907, in run_commands
self.run_command(cmd)
  File /usr/lib/python2.3/distutils/dist.py, line 927, in run_command
cmd_obj.run()
  File setup.py, line 133, in run
import test.regrtest
ImportError: No module named regrtest
$
(Am using 2.3.3 on Linux, btw)

It went away when I disable the line in setup.py:
#sys.path.insert(0, self.test_dir)
But then it seems to get lost in finding other stuff:

$ python setup.py test
running test
running build
running build_py
running build_ext
running config
gcc -E -I/usr/include/python2.3 -o _configtest.i _configtest.c
removing: _configtest.c _configtest.i
test_inttypes
test_inttypes skipped -- No module named test_inttypes
1 test skipped:
test_inttypes
1 skip unexpected on linux2:
test_inttypes
$
-jcw

_
Metakit mailing list  -  [EMAIL PROTECTED]
http://www.equi4.com/mailman/listinfo/metakit

Re: [Metakit] First metakit failure, database grew to 2+ gigabyte s

2004-05-12 Thread Jean-Claude Wippler

Brian Kelley wrote:

Berk, Murat wrote:

 We use 'spans' and remove them in one operation and also do not
 commmit anything until we finish a pass over all rows.

 But main trick is blocked views, which uses smaller footprint on
 commits. Murat
Yeah, I am using blocked views as well, but after checking the code, I
was commiting after every delete!  Ouch!  I'm switching over to
deleting spans so it should work a lot better.
The memory usage of individual deletes, especially across blocked 
views, is most probably due to MK allocating 4 Kb buffer chunks in 
every column a change is made (and sometimes much more to hold modified 
copies of ranges of data).  With blocked views, I suspect that memory 
usage could indeed rise to a multiple of the dataset.  A blocked view 
with say 5 columns and 4 rows, could have 5 x 40 = 200 blocks, i.e. 
800 Kb of sparsely filled buffers pending until flushed by a commit or 
rollback.

The fix for this would be to track the total set of buffers, and start 
coalescing some in-memory data buffers to free some of that up (and to 
do so well before actual commits).

I'm surprised that memory usage stays high across commits though, and 
even more by what looks like a 32-bit sign overflow in file positions 
getting through undetected and messing up a datafile.  The 2 Gb limit 
should lead to commit failures, not file damage!

-jcw

_
Metakit mailing list  -  [EMAIL PROTECTED]
http://www.equi4.com/mailman/listinfo/metakit

Re: [Metakit] Mk4py build test Q

2004-04-26 Thread Jean-Claude Wippler

Hello Jack,

(thanks for your help on test.py vs. mktest.py)

If you rename test.py to mktest.py you should be able to use both of
them.
I saw the mktest.py rename in CVS, and it almost works for me.
I get the 'freebsd4' suite of tests (on debian linux) which tries
to include a stdlib test module that only applies to freebsd.
I haven't looked at it any closer, but I would guess something
in CVS has a hard definition for freebsd.
I've just checked in some more changes and a few files I missed for 
Mk4py testing.  The tests now seem to work on Linux.  I've not found 
anything specific for FreeBSD so far.  It may be caused by something 
which is Mac OS X specific, which is also *BSD-ish.

-jcw

_
Metakit mailing list  -  [EMAIL PROTECTED]
http://www.equi4.com/mailman/listinfo/metakit

[Metakit] blocked views

2004-04-28 Thread Jean-Claude Wippler

There is a faster implementation of blocked views in CVS now.  It 
evolved from a change submitted by M. Berk (thank you!) and appears to 
have a considerable effect on performance.  The trick is to cache the 
last used subview.

If you use blocked views and check out the latest code from CVS, you 
will see.

If you don't, let me just say that blocked views are now a good option 
for very large views.  Performance benefits are particularly good for 
views with many properties, and when traversing them sequentially.

To switch to using blocked views, change code which looks like:
vw = store.getas(vw[...])
to
vm = store.getas(vw[_B[...]]).blocked()
You'll also need to reload data, this change won't convert it for you.
With a somewhat lower raw access performance, you'll get much more 
scalable views (millions of rows and more), faster commits, and smaller 
datafiles.

There's no need to switch over every view - it's still a trade-off.  If 
your views are rarely modified, or contain no strings, or are always 
accessed in random order (hash maps), then flat is often still better.  
But it's there if you want it.

-jcw

_
Metakit mailing list  -  [EMAIL PROTECTED]
http://www.equi4.com/mailman/listinfo/metakit

Re: [Metakit] PyDS 0.7.2 database corruption

2004-05-14 Thread Jean-Claude Wippler

Nicholas Riley wrote:
I am not sure whether this is PyDS threading issues or Metakit bugs.
In any case, Metakit should not crash while attempting to read data
from a database!
Agree.  But stray pointer writes can damage things.  I'm not saying 
this is the case here, just pointing out that software bugs in the same 
address space can damage a MK datafile despite its failsafe logic.

If anyone (Jean-Claude?) wants to see one/more of the databases, I can
send them.
Please do.
-jcw
_
Metakit mailing list  -  [EMAIL PROTECTED]
http://www.equi4.com/mailman/listinfo/metakit

Re: [Metakit] Metakit and Tcl (and maybe others)

2004-05-27 Thread Jean-Claude Wippler

Bob X wrote:
I am using 2.4.9.3 on Windows XP with ActiveTcl.
I am creating a simple ticket tracker and I defined my view:
set view [mk::view layout db.tracker username:S ticket:S recieved:I 
closed:I problem:S notes:S status:I]

I then append into the view:
mk::row append $view username Jeff Walsh ticket 01081 recieved 
20040419 closed 20040419 problem Password locked notes Reset 
password to Ellipse status 0

I then get errors:
error
expected integer but got 01058 (looks like invalid octal number)
while executing
mk::row append $view username Don Lang ticket 01058 recieved 
20040409 closed 20040409 problem Application is hanging notes 
Network prob...
(file initial_loader.tcl line 18)
/error
Yes, leading zero's bite when treating a Tcl string as an integer.  I'm 
assuming the ticket:S is actually a ticket:I in the example you 
gave - then it would fail.

The leading zero defaulting to octal mode is a painful idiosyncrasy of 
Tcl, see
	http://mini.net/tcl/498
	http://www.tcl.tk/cgi-bin/tct/tip/114.html

I could change it to a String (works that way) but I would like to 
leave it an Integer. Are the leading zero's causing the problem? I 
have to have those as the program spitting the data out uses those.
You can't have your cake and eat it in cases like these :) - either you 
treat values as integers (which have no knowledge of representation, 
such as leading zero's) or you stick to a string, which is slower and 
takes up more space.

I tend to use either of two tricks for this:
- add 10 to the value and store that
  (then strip 1st char again on extract to make sure the 0's stay)
- convert to int via ... ticket [scan %d 01058] ...
  (and convert back as needed with: puts [format %9d $value])
-jcw
_
Metakit mailing list  -  [EMAIL PROTECTED]
http://www.equi4.com/mailman/listinfo/metakit

Re: [Metakit] c4_Bytes.Modify() mangling data

2004-09-23 Thread Jean-Claude Wippler

Brian Kelley wrote:
I was inserting strings of length 2 which was why it worked for me.
Yours were larger.  It turns out that you can't cross the end boundary
when using modify.  So if you are inserting a string of length 10, you
can't insert it into a string of length 9.  Also, if you are inserting
it into a string of length 12, you can only insert at 0 or 1!
Whoa.  Good catch and analysis!
The attached change ought to fix this issue.  I'll verify this later, 
but it'll let you proceed for now (or you can use the Python 
workaround, of course).

-jcw
Index: viewx.cpp
===
RCS file: /home/cvs/metakit/src/viewx.cpp,v
retrieving revision 1.11
diff -u -p -u -r1.11 viewx.cpp
--- viewx.cpp   23 Nov 2003 01:42:51 -  1.11
+++ viewx.cpp   23 Sep 2004 17:49:15 -
@@ -581,21 +581,15 @@ bool c4_BytesRef::Modify(const c4_Bytes
 c4_Handler h = _cursor._seq-NthHandler(colNum);
 const int n = buf_.Size();
 const t4_i32 limit = off_ + n; // past changed bytes
-const t4_i32 overshoot = limit - h.ItemSize(_cursor._index);
-
-if (diff_  overshoot)
-  diff_ = overshoot;
+  // get rid of an optimization, it was wrong  (2004-09-23)
 c4_Column* col = h.GetNthMemoCol(_cursor._index, true);
 if (col != 0)
 {
   if (diff_  0)
 col-Shrink(limit, - diff_);
   else if (diff_  0)
-  // insert bytes in the highest possible spot
-  // if a gap is created, it will contain garbage
-col-Grow(overshoot  0 ? col-ColSize() :
-   diff_  n ? off_ : limit - diff_, diff_);
+col-Grow(off_, diff_);
   col-StoreBytes(off_, buf_);
 }
_
Metakit mailing list  -  [EMAIL PROTECTED]
http://www.equi4.com/mailman/listinfo/metakit

Re: [Metakit] How to backup a metakit database?

2004-10-13 Thread Jean-Claude Wippler

Allan Wind wrote:
How do you backup a metakit database?
The cold case is obvious as usual, ensure that no one else has the
database file open prior to making a copy of the data with a file level
tools (cp, tar etc).
What are the options for hot (i.e. open with an active writer) backups?
I noticed the information in the python api reference for doing this
from the writer thread/process, but are there any options for doing it
externally?  If the file is open commit-extend, can you use the same
trick if you open the datbase read-only in a 2nd process?  If using
commit-aside, is it then safe to just low-level copy the main
database?
The race is during a commit (from when it starts to until it 
completes), because that is when MK writes to file.  You will need to 
stay out of that time span if you wish to have a solid backup.  It 
seems to me that it could be done on a not-too-active DB simply as 
follows:
	- determine clock time T
	- wait until at least one sec has passed since T
	  (actually: the time resolution of the underlying filesystem)
	- copy entire datafile
	- check mod date of orig, must still be = T
	- rinse and repeat if this test failed
On Windows, I am not sure this will work: if the O/S does not update 
modtimes right away then the above will not be reliable.

The other way to do it is with support from the committing app so 
independent readers  have a way of telling whether there was a commit, 
say by incrementing a revision number of a separate info file.

Is journaling planned?  I.e. point in time recoverability between
backups.
There is a first cut at this via the commit-aside mode.  There have 
also been simple-but-working tricks in the Tcl wrapper to intercept all 
calls (to do remoting, as well as creating transcripts of all requests 
for debugging/replay).

With custom viewers, one could write a view layer which intercepts all 
changes at the C++ level, but that requires more work and discipline 
during use.  I've been hesitant to implement such thing only because I 
am trying to improve the raw core of MK before building more on top.  
They are definitely good ideas and *very* worthwhile, IMO.

-jcw
_
Metakit mailing list  -  [EMAIL PROTECTED]
http://www.equi4.com/mailman/listinfo/metakit

Re: [Metakit] Re: c4_Bytes destructor causes memory deallocation failure

2004-10-13 Thread Jean-Claude Wippler

Arto Stimms wrote:
It seems that the destructor releases the wrong
memory.
In the debug build it gives an assertion at the
deallocation, but in a release build it gives no
error.
This just makes it worse though, since it may later
try to use the released memory, causing a crash.
Check this example program which on my machine fails
after the fourth iteration:
#include mk4.h
#include string
#include iostream
using namespace std;
void main() {
c4_Storage storage(datafile.kit, true);
c4_View v = storage.GetAs(v[b:B]);
v.Add(c4_Row());
c4_BytesProp pBytes(b);
string teststring(Hello, this is a test!); //
len=22
c4_Bytes textbytes(teststring.data(),
teststring.length());
for (int i=0;i  100;++i) {
cout  i  endl;
c4_Bytes newbytes = pBytes(v[0]).Access(0, 17);
pBytes(v[0]).Modify(textbytes, 0, textbytes.Size());
}
}
I am using metakit 2.4.9.3 with the modify patch on
windows.
I am not seeing this with the CVS build on Linux or Mac OS X.  Don't 
have a Windows compile setup ready this minute, could you check with 
latest CVS as well?

(FWIW, I had to add an AutoCommit() call to make anything end up on 
disk)

-jcw
_
Metakit mailing list  -  [EMAIL PROTECTED]
http://www.equi4.com/mailman/listinfo/metakit

Re: [Metakit] In search of sparc solaris metakit building tips

2004-10-25 Thread Jean-Claude Wippler

Larry,
metakit 2.4.9.3 and Tcl/Tk 8.4.7
I'm having a bit of a problem:
$ configure --prefix=/usr/tcl84 --enable-shared --enable-symbols 
--with-tcl
{lots of stuff output - can email if desirable}
$ make all
{a lot more output}

CC -c -g -I../unix/../include -I/usr/tcl84/include/generic 
-I/usr/tcl84/include ../unix/../tcl/mk4tcl.cpp  -KPIC -DPIC

{a lot of warnings}
../unix/../tcl/mk4tcl.cpp, line 415: Error: Cannot cast from 
c4_LongRef to long long.
../unix/../tcl/mk4tcl.cpp, line 490: Error: Cannot assign long long 
to c4_LongRef without c4_LongRef::operator=(const c4_LongRef);.
If these are the only fatal errors, I suggest you try the following:
Change line 415 to:
  Tcl_SetWideIntObj(obj_, (t4_i64) (((c4_LongProp) prop_) (row_)));
And line 490 to:
  ((c4_LongProp) prop_) (row_) = (t4_i64) value;
If that doesn't work, then there may be some weirdness w.r.t. 
64-bitness and int/long casts which I cannot diagnose further without 
access to a setup like yours (it would be great if someone else can).  
The option you have left in that case, is to fully disable Tcl's wide 
(8-byte) ints in the Mk4tcl interface, by replacing
	#ifdef TCL_WIDE_INT_TYPE
with
	#if 0
on source lines 413 and 484.

-jcw
_
Metakit mailing list  -  [EMAIL PROTECTED]
http://www.equi4.com/mailman/listinfo/metakit

Re: [Metakit] corrupt database?

2004-12-04 Thread Jean-Claude Wippler

Roy Sigurd Karlsbakk wrote:
I'm an OS X user and my addressbook just fscked up. This is for what 
I've been told, based on metakit.

Are there any tools around that I can use to try to rebuild it? I can 
find the data scattered all over the database file, but I can't 
assemble it...
Chances are very slim.  Metakit datafiles have very little redundancy.  
Data is stored column-wise, which means that adjacent items on file are 
not part of the same row but values from different entries.  Finding 
out which item goes with which is next to impossible if the structural 
information in the datafile is damaged.

Having said that, this is the very first report ever of a corrupted 
address book, as far as I'm aware.  I cannot quite rule out a hardware 
glitch at this stage.  Note that the datafile is in your Library - 
Application Support - AddressBook folder and is called 
AddressBook.data.  There is also an AddressBook.data.previous, 
which might contain a backup if all else fails.

One way to determine whether your data is salvageable is perhaps the 
following:

1) download these two files to your Desktop folder:
http://www.equi4.com/pub/tk/8.4.8/tclkit-darwin-ppc.gz
http://mini.net/sdarchive/mk2tcl.kit
2) launch the Terminal application, it's in your Utilities folder
3) in the new window, enter these lines *exactly* as follows:
cd Desktop
gzip -d tclkit-darwin-ppc.gz
chmod +x tclkit-darwin-ppc
./tclkit-darwin-ppc mk2tcl.kit  saved.txt \
'../Library/Application Support/AddressBook/AddressBook.data'
4) open the newly created saved.txt file, i.e. double-click it
   (make the TextEdit window as wide as you can, preferably)
With a bit (a lot!) of luck, you may be able to see entries from your 
address book.  If not, then I don't see an easy way to recover things - 
it may not be possible at all in fact.  If you do see entries, then my 
suggestion would be to contact Apple since in that case the datafile 
itself is still readable (I have no knowledge or involvement in the 
AddressBook itself).

-jcw
_
Metakit mailing list  -  [EMAIL PROTECTED]
http://www.equi4.com/mailman/listinfo/metakit

Re: [Metakit] corrupt database?

2004-12-07 Thread Jean-Claude Wippler

Roy Sigurd Karlsbakk wrote:
4) open the newly created saved.txt file, i.e. double-click it
   (make the TextEdit window as wide as you can, preferably)
there's no such editor like vi
TextEdit sucks :)
With a bit (a lot!) of luck, you may be able to see entries from your 
address book.  If not, then I don't see an easy way to recover things 
- it may not be possible at all in fact.  If you do see entries, then 
my suggestion would be to contact Apple since in that case the 
datafile itself is still readable (I have no knowledge or involvement 
in the AddressBook itself).
grr.
I got more info about it from 'strings AddressBook.data.previous'
but then, the tables are stored (name table)garbage(number 
table)garbage etc

so I can prolly find the format somehow, some day...
thanks anyway
Have you tried what I suggested?
-jcw
_
Metakit mailing list  -  [EMAIL PROTECTED]
http://www.equi4.com/mailman/listinfo/metakit

Re: [Metakit] make install error

2004-12-16 Thread Jean-Claude Wippler

Kenny Chamber wrote:
I've been trying to get metakit (both cvs and latest tarball) to 
compile
with no success.  Actually it compiles but won't install.
The following is the output of the make install command:

make install
mkdir -p /usr/include /usr/lib
/bin/sh ./libtool --mode=install /bin/install -c -m 644
../unix/../include/mk4.h \
   ../unix/../include/mk4.inl \
   ../unix/../include/mk4str.h \
   ../unix/../include/mk4str.inl /usr/include
/bin/install -c -m 644 ../unix/../include/mk4.h /usr/include/mk4.h
/bin/install -c -m 644 ../unix/../include/mk4.inl /usr/include/mk4.inl
/bin/install -c -m 644 ../unix/../include/mk4str.h 
/usr/include/mk4str.h
/bin/install -c -m 644 ../unix/../include/mk4str.inl
/usr/include/mk4str.inl
/bin/sh ./libtool --mode=install /bin/install -c libmk4.la /usr/lib
/bin/install -c .libs/libmk4.lai /usr/lib/libmk4.la
/bin/install: cannot stat `.libs/libmk4.lai': No such file or directory
make: *** [install-mk] Error 1

As far as I can tell the file libmk4.lai doesn't exist anywhere.  Any
suggestions would be appreciated.
You're running a mix of 2.4.9.3 (libtool-based) and cvs HEAD (libtool 
gone).
You need to do a make distclean and then re-run configure, make, etc.

-jcw
_
Metakit mailing list  -  [EMAIL PROTECTED]
http://www.equi4.com/mailman/listinfo/metakit

[Metakit] Re: Appending rows, effectiveness, documentation

2005-01-06 Thread Jean-Claude Wippler

Wolfgang Lipp wrote:
imho, it would be a good idea to have a command similar to 
metakit.wrap() to add large number of data items to an existing view; 
that would solve most problems. or is there some efficient way to get 
the data from one (in-memory) view to another (on-disk) view?
What language?  In C++ you can insert one view into another.  That and 
using blocked views should go a long way (don't pre-allocate in blocked 
views, it probably won't help much).

Ah, wait, your metakit.wrap() comment indicates you're using the 
Python binding.  Hmmm, looks like we forgot to add a wrapper for C++'s 
view.InsertAt(pos,view).

-jcw
_
Metakit mailing list  -  Metakit@equi4.com
http://www.equi4.com/mailman/listinfo/metakit

Re: Re[6]: [Metakit] newbie question - writing derived view back to db

2005-01-19 Thread Jean-Claude Wippler

Marcin Krol wrote:
Geez, Brian, you're a wizard!
I agree 100%.
After syncing: 23.08
[...]
Thanks for the help, Brian, now I have to go away to munch on
all that.
Now that you have these results: what file sizes do you see across the 
different DB's?

(It might also be interesting to compare int-field performances  
sizes, BTW)

-jcw
_
Metakit mailing list  -  Metakit@equi4.com
http://www.equi4.com/mailman/listinfo/metakit

Re: Re[2]: [Metakit] newbie question - writing derived view back to db

2005-01-19 Thread Jean-Claude Wippler

Marcin Krol wrote:
BK vw2 = st.getas(test_save[a:i,b:s])
[...]
However, there's another silly problem here remaining: how to delete
the old view 'test' from the db and rename 'test_save' to 'test'?
Try:
st.getas(test_save)
(note the absence of brackets and fields)
As for renaming, you'll have to copy things over, I'm afraid.
-jcw
_
Metakit mailing list  -  Metakit@equi4.com
http://www.equi4.com/mailman/listinfo/metakit

Re: [Metakit] Mailing lists out of order

2005-01-29 Thread Jean-Claude Wippler

Well, the mailing list web interface is working again, yippie!
	http://www.mail-archive.com/mailman-users@python.org/msg29743.html
[...]
I'll just assume it'll be addressed over the coming days and come down 
as an update.
The exact explanation.  With a 10 sec fix:
http://www.mail-archive.com/mailman-users@python.org/msg29850.html
Yegadda love the net!
-jcw
_
Metakit mailing list  -  Metakit@equi4.com
http://www.equi4.com/mailman/listinfo/metakit

Re: [Metakit] Python Patch for inserting a view into a view

2005-01-29 Thread Jean-Claude Wippler

Brian Kelley wrote:
At long last, attached is the diff and the new PyView.cpp file that
allows the python interface to insert a view into another view.
usage:
 view.insert(index, view2)
is now supported.  Properties that don't exist in view but exist in
view2 will be added to view.
example:
import metakit
st = metakit.storage()
v = st.getas(test[a:i,b:S,c:S])
v2 = st.getas(test2[d:i])
for i in range(100):
v.append((i, str(i), str(i)))
v2.append((1,))
v2.insert(0, v)
metakit.dump(v2)
del st
Thank you, I've applied the patch.  It's in CVS for now.  I'm 
considering wrapping up a new minor release  distribution again, to 
wrap up the last few tweaks.

-jcw
_
Metakit mailing list  -  Metakit@equi4.com
http://www.equi4.com/mailman/listinfo/metakit

[Metakit] Regression in MK C++, Mk4py, and Mk4tcl

2005-02-18 Thread Jean-Claude Wippler

FYI, the following change to MK appears to be faulty:
2004-09-23Fix c4_BytesRef::Modify bytes insertion
It shows up in MK's regression test b26, which fails.  There is an 
explanation for why this hasn't been caught before, which I won't go 
into.  It's most unfortunate.

Thanks to Pat Thoyts for reporting the details of this.
If you rely on insertion/mods/deletes of partial data in fields, you 
may want to revert to an earlier CVS checkout, i.e. cvs ... -D 
2004-09-21 

Several Tclkit builds are affected, probably 8.4.[7-9] and 8.5a2 - 
these have all been built after that time, and may or may not have used 
cvs HEAD then.  If you are worried about potential datafile corruption 
from this, revert to Tclkit 8.4.6 to avoid it.

I've not yet researched what exactly happens, but wanted to get this 
notice out asap.

-jcw
_
Metakit mailing list  -  Metakit@equi4.com
http://www.equi4.com/mailman/listinfo/metakit

[Metakit] Re: [Starkit] Regression in MK C++, Mk4py, and Mk4tcl

2005-02-18 Thread Jean-Claude Wippler

FYI, the following change to MK appears to be faulty:
	2004-09-23Fix c4_BytesRef::Modify bytes insertion
The change has been undone in CVS now, so latest CVS should be ok again.
-jcw
_
Metakit mailing list  -  Metakit@equi4.com
http://www.equi4.com/mailman/listinfo/metakit

[Metakit] RSS feed

2005-02-18 Thread Jean-Claude Wippler

I've just found out that Gmane, the mailing-list-to-news gateway 
service, now also has a blog gateway service.

So now you have three different ways to track postings the this mailing 
list:

 - Mailman: http://www.equi4.com/mailman/listinfo/metakit
 - News: http://news.gmane.org/gmane.comp.db.metakit/
 - Weblog: http://blog.gmane.org/gmane.comp.db.metakit?set_skin=zawodny
   (RSS feed at http://rss.gmane.org/gmane.comp.db.metakit !)
Gmane does a number of clever things, such as removing the mailman info 
blurb at the bottom of each posting.  It also looks like it supports 
posting, though I'm not sure those get through the filters.  You can 
turn on the no-mail option in Mailman if you prefer to use one of the 
other mechanisms to track news yet want to be able to post yourself.

Thank you Gmane, for a wonderful free service.
-jcw
_
Metakit mailing list  -  Metakit@equi4.com
http://www.equi4.com/mailman/listinfo/metakit

Re: [Metakit] Maximum practical size of Metakit databse?

2005-05-05 Thread Jean-Claude Wippler

Davis Adrian wrote:
What factors govern the maximum practical size for a Metakit database?
(This email was pending in a queue I rarely check nowadays due to the  
levels of spam flooding it, please consider subscribing to the  
mailing list to avoid getting in there)

There's a hard limit at 2 Gb due to the way signed 32-bit ints are  
used in MK and due to the limitations of a 32-bit address space.

You'll be able to get close to that if there are not too many  
subviews, you don't modify large amounts of data (modifications use 4  
Kb memory buffer chunks).  Reading is usually ok, it's usually the  
creation side that causes trouble first.  I'd expect a blocked view  
with only numerical data to get furthest, all the way up to that 2 Gb  
barrier in fact.

So with MK 2.4.9.3, I'd say that generally speaking 1 Gb is roughly  
the end of it.  If you have large amounts of data being small ints,  
these sizes cannot easily be compared with other database solutions,  
due to MK's use of adaptive int vectors which can be substantially  
more compact.

Btw, on a 64-bit architecture you can actually sneak around these  
limits by using multiple datafiles.  Note that all view operators can  
be used across datafiles.

The current codebase won't go beyond these limits even on a 64-bit  
architecture.  In the lab I've been growing a new strain of MK  
which overcomes this, as a recent test with an 8 Gb MK datafile  
proved (the next-generation size limit will end up in the Tb range).   
IOW, the file format can handle larger datasets - it's just the code  
which runs out of steam.

-jcw
_
Metakit mailing list  -  Metakit@equi4.com
http://www.equi4.com/mailman/listinfo/metakit

[Metakit] Performance comparison Q

2005-05-09 Thread Jean-Claude Wippler

To all language specialists: I'm looking for a way to establish some  
basic performance figures, to compare and evaluate a number of  
approaches I'm exploring in the Vlerq project.

As a very first datapoint, it would be nice to find out how one  
writes decent loops for a very simple task: sum the items of a list  
of 50,000 integers, running from 0 to 49,999.  This is quite an  
important operation in MK, where cumulative offsets must often be  
calculated - it also gives an indication how efficient integer lists/ 
vectors are.

The C code is pretty obvious:
int sum = 0; for (i = 0; i  5; ++i) sum += data[i];
This one in tcl 8.4.6 runs at quite a bit under 1% of that speed:
set sum 0; foreach x $data { incr sum $x }
My question is: how would you write the above in insert your  
language of choice here ?

This is not flame bait.  I'm not trying to prove X is better than Y,  
I'm trying to find out what range of performances one sees these  
days, and how much I can get away with for now by *not* optimizing my  
new code to the limit (it also affects some major decisions on what  
internal data structures I should use at this stage).

I'm aware of the various language shootout websites, the risks of  
benchmarking, and cache effects.  Still, self-contained examples of  
this logic would help me avoid seriously flawed timings in other  
languages when applied to tasks which are relevant to Metakit.

I'll summarize results.
-jcw
PS.  All timing comparisons are being done using a PIII/650 on  
Linux.  I've got the following installed so far if you're interested:  
python 2.3.4, perl 5.8.5, ruby 1.8.2, php 4.3.10, java 1.4.2, icon  
9.40, gforth 0.6.2, lua 5.0.2 - can add more as needed.
_
Metakit mailing list  -  Metakit@equi4.com
http://www.equi4.com/mailman/listinfo/metakit

Re: [Metakit] Performance comparison Q

2005-05-09 Thread Jean-Claude Wippler

Brian Kelley wrote:

python:
==
import operator
data = range(5) # test data
result = sum(data)
Nice, of course.
What about arbitrary operators, not just summation.  I'm trying to  
stress generic looping, as well as see how well lists, ints, and  
addition work together.  Sorry for the confusion.

-jcw
_
Metakit mailing list  -  Metakit@equi4.com
http://www.equi4.com/mailman/listinfo/metakit

Re: [Metakit] Performance comparison Q

2005-05-09 Thread Jean-Claude Wippler

Bruce A.Johnson wrote:
Are you putting the Tcl test line inside a proc?
Yes, thx.
-jcw
_
Metakit mailing list  -  Metakit@equi4.com
http://www.equi4.com/mailman/listinfo/metakit

Re: [Metakit] Performance comparison Q

2005-05-09 Thread Jean-Claude Wippler

Magnus Lie Hetland wrote:
result = sum(xrange(5))
I just did a simple experiment (using the timeit module) comparing
the performance of sum(range(5)) and sum(xrange(5)), and the
latter gave a speedup factor of about 2.2 on my computer... Also,
allocating a list of size 5 seems a bit wasteful just for
computing this sum :)
The point is not the result (25000*4 will get there a lot faster).
I'm trying to see how a list of values, iteration over it, and a  
simple integer operator work together in each particular language.

Don't quite see the same speedup for xrange, but as I said it is not  
the issue here for me.

-jcw
_
Metakit mailing list  -  Metakit@equi4.com
http://www.equi4.com/mailman/listinfo/metakit

Re: [Metakit] Performance comparison Q - results

2005-05-09 Thread Jean-Claude Wippler

Here are some performance figures, as promised.  All timings were  
done on a PIII/650 laptop with Gentoo Linux 2.6.10, gcc 3.3.5 on May  
9, 2005.

The task: calculate the sum of a list containing the numbers 0..4.
C0.6 mSec  array of ints
Python loop 72   mSec  for  s += x (or s += data[i])
Python reduce   36   mSec  reduce(operator.add,data)
Python sum  18   mSec  built-in sum()
Tcl foreach 37   mSec  foreach  incr
Tcl for 44   mSec  for  incr  lindex
Thrill vec  24   mSec  0 swap { @ + } rep*
Thrill ints  5   mSec  convert to int vec  use C primitive
Please take these figures with a grain of salt.  I've not  
investigated memory use.

That's Python 2.4.3 and Tcl 8.4.6, BTW.  Let me add that my first  
naive timings for Tcl were 8x slower - which shows how easy it is to  
go wrong in performance measurements.

The last two entries use a Forth'ish language I've been using in the  
Vlerq research project.

These results look promising because it seems to indicate that I  
could adopt Thrill's generic interpreted code for now, without  
performance problems.  The C code is obviously in a different league,  
but this particular operation is included as primitive in Thrill, so  
special-casing is right around the corner.

Thanks Brian, Jeff, Gary, Magnus, Bruce, Jacob, this was most  
enlightening.

-jcw
_
Metakit mailing list  -  Metakit@equi4.com
http://www.equi4.com/mailman/listinfo/metakit

Re: [Metakit] Starting work on a Java version of Metakit

2005-05-12 Thread Jean-Claude Wippler

[EMAIL PROTECTED] wrote:
[... to java or not to java ...]
I respect your concerns.
I can well imagine that relatively direct Java access to Metakit  
databases would be welcomed by a significant number of Java  
developers.  I encourage this effort.
Me too.  And if there is someone out there who wants to create a  
binding for Ruby, R, PHP, Perl, Lua, C#, or any other language: I'll  
bend over backwards to help you succeed.  There are some recent  
developments which might substantially simplify that effort, so  
please contact me if you're interested.

Metakit has always been about *not* tying data formats to a language  
(as most serialized formats do), and not to a limited time-frame  
(i.e. maintaining compatibility and readability for the very long  
term).  Metakit's file format is the way it is for very strong  
technical reasons, but I have some self-contained pure-Tcl and pure- 
Python readers laying around if people cannot use the C++ bindings  
for some reason, so no-one can accuse me of pursuing a lock-in strategy.

Making MK data usable from many more languages is a long term goal.   
As I said, I welcome everyone who wants to help make that happen.   
Feel free to pass this invitation on.

On the topic of speed: I'm working on creating a more highly  
vectorized design for Metakit.  So far, this has not only  
demonstrated (in the lab) potential for more performance, it also  
means that it will make it less of an issue as to which host  
language people decide to work with.  The trend is towards making  
the real crunching happen in a smaller part of the code - which can  
be tweaked and tuned to no end, whether in C, machine code, vector- 
hardware, or even some existing high-performance library to hook  
into.  It's a bit like GUI's, every app today benefits from major  
advances made in the OS and video driver and video hardware and all  
sorts of GPU's.

-jcw
_
Metakit mailing list  -  Metakit@equi4.com
http://www.equi4.com/mailman/listinfo/metakit

Re: [Metakit] Starting work on a Java version of Metakit

2005-05-12 Thread Jean-Claude Wippler

Brian Kelley wrote:
And jcw, could I see the python only reader please, please :)
Yeah, I was afraid you'd ask.  Took me ages to find it on an old CD  
backup, even though I'm pretty well organized w.r.t. my backups these  
days (it's hard to find things by location when you don't know  
*where* they are and it's hard to find things by name when you don't  
remember *what* you called it!).

Attached, vintage 1999 code.  It may no longer work due to MK 1.9 -  
2.x file format tweaks.  Just for completeness, a Tcl version is at:
http://www.equi4.com/pub/sk/readkit.tcl

-jcw



unmk.py
Description: application/applefile
# Decoding a MetaKit datafile in Python
#
# JCW/1999-11-13/2000-04-22/

import os, struct, shlex, StringIO, string, array

reader = None
freespace = None

def HexDump(s):
	 a rudimentary hex data dump 
	v = []
	for c in s: v.append(%02X % ord(c))
	return string.join(v)

def DeduceWidth(numrows, size):
	 calculate bits per int, given row count and column size 
	w = 0
	if numrows  0:
		w = (size  3) / numrows
		if numrows = 7 and 0  size = 6:
			widthtab = [
( 8, 16, 1, 32, 2, 4 ),	#  n = 1
( 4, 8, 1, 16, 2, 0 ),	#  n = 2
( 2, 4, 8, 1, 0, 16 ),	#  n = 3
( 2, 4, 0, 8, 1, 0 ),		#  n = 4
( 1, 2, 4, 0, 8, 0 ),		#  n = 5
( 1, 2, 4, 0, 0, 8 ),		#  n = 6
( 1, 2, 0, 4, 0, 0 ) ]	#  n = 7
			w = widthtab[numrows-1][size-1]
			assert w  0
		assert (w  (w-1)) == 0
	return w

def CheckFreeSpace():
	freespace.sort()
	curr = 0
	gaps = 0
	bytes = 0
	print 'Free space summary:'
	for (pos, len) in freespace:
		if pos  curr:
			print ### Free space is corrupt: (%d,%d) overlaps %d % \
(pos, len, curr)
		if pos  curr:
			print   Free: %6d..%-6d (%db) % (curr, pos-1,pos-curr)
			gaps = gaps+1
			bytes = bytes + (pos - curr)
		curr = pos + len
	print %d bytes free in %d gaps, %db used, last used is %d % \
		(bytes, gaps, curr-bytes, curr)
		
class IntVector:
	
	 An array which accesses ints of 0..32 bits 
	
	def _get_0b(self,index):
		return 0
	def _get_1b(self,index):
		return (self.vector[index3]  (index7))  1
	def _get_2b(self,index):
		return (self.vector[index2]  ((index3) * 2))  3
	def _get_4b(self,index):
		return (self.vector[index1]  ((index1) * 4))  15
		
	def __init__(self,width,data):
		type = 'b'
		if width == 0:
			self.__getitem__ = self._get_0b
		elif width == 1:
			self.__getitem__ = self._get_1b
		elif width == 2:
			self.__getitem__ = self._get_2b
		elif width == 4:
			self.__getitem__ = self._get_4b
		elif width == 8:
			type = 'b'
		elif width == 16:
			type = 'h'
		elif width == 32:
			type = 'l'
		else:
			assert None
		self.vector = array.array(type, data)
		
	def __getitem__(self,index):
		return self.vector[index]
			
class Column:
	
	 A range of bytes on disk 
	
	def __init__(self):
		self.size = reader.pull()
		self.pos = 0
		if self.size:
			self.pos = reader.pull()
			freespace.append((self.pos, self.size))
			
	def __repr__(self):
		return 'Column: @%d [%db]' % (self.pos, self.size)
		
	def __len__(self):
		return self.size
		
class ColOfInts (Column):
	
	 A column interpreted as vector of integers 
	
	def __init__(self, numrows):
		Column.__init__(self)
		self.numrows = numrows
		self.width = DeduceWidth(numrows, self.size)
		data = reader.fetch(self.pos, self.size)
		self.getter = IntVector(self.width, data)
		
	def __repr__(self):
		return 'ColOfInts: #%d/%d, @%d [%db]' % \
			(self.numrows, self.width, self.pos, self.size)
			
	def __len__(self):
		return self.numrows
		
	def __getitem__(self,index):
		return self.getter[index]
	
class BytesCol:
	
	 A data + size column pair 
	
	def __init__(self, numrows):
		self.data = Column()
		self.size = None
		self.pos = None
		if self.data.size:
			self.sizes = ColOfInts(numrows)
			self.offsets = [self.data.pos]
			for s in self.sizes:
self.offsets.append(self.offsets[-1] + s)
		self.memos = Column()
			
	def __repr__(self):
		return 'BytesCol  %s, %s ' % (self.data, self.sizes)
			
	def __len__(self):
		return self.sizes.numrows
		
	def __getitem__(self,index):
		i1 = self.offsets[index]
		i2 = self.offsets[index+1]
		return %10d-%-4d = %s % (i1,i2,`reader.data[i1:i2]`)
	
class View:
	
	 A view is a columnar version of a table 
	
	def __init__(self, parent=None, fields=None):
		self.parent = parent
		self.columns = []
		self.sias = reader.pull()
		assert self.sias == 0 # not yet
		if fields is None:
			reader.descriptor = reader.read(reader.pull())
			fields = reader.parseDesc()
		self.fields = fields
		self.numrows = reader.pull()
		for (name,code) in fields:
			if type(code) == type([]):
col = Column()
assert type(code) == type([])
savepos = reader.pos
reader.pos = col.pos
col = []
for r in xrange(self.numrows):
	v = View(self, code)
	col.append(v)
reader.pos = savepos
			elif code in IFD:
col = ColOfInts(self.numrows)
			elif code in BS:
col =

Re: [Metakit] Investigating a corrupt metakit file.

2005-06-09 Thread Jean-Claude Wippler


Pat,


The problem is that out of 56 _B subviews all but one are as
expected. However, one block is damaged. The values for the date and
size columns have got swapped about.


If the data file is wrong, but readable, then my first hunch would be  
a bug.  Setting the wrong column could point to a property cache bug,  
either in the core or in the Tcl binding (the latter is more likely,  
IMO).


I have had one report in the past (at least a year ago) of a mixup,  
also from Tcl.  I was not able to reproduce it, and it seemed to be  
related to byte-code compilation, i.e. whether the Tcl script was  
inside a proc or not.


The problem went away (well, that's what I like to think) by using  
distinct property names, I think it was related to have two props  
named the same but with a different type (:S vs :I or some such).



  dirs[name:S,parent:I,ctime:I,atime:I,mtime:I,clsid:S,state:I,
 files[
_B[name:S,size:I,date:I,state:I,contents:B] ] ]


In your case, I see only state in two different views and in two  
different column positions, and you're not listing problems with that  
one, so probably this whole hunch is irrelevant.


Did you restructure the view at any point in time?  I.e. did the  
layout change once you started adding the first data?


If you can create a test set which fails (big if, I know), then I  
could investigate or write a Python test to see whether this is Tcl- 
specific.  You can probably leave out the contents to create a much  
smaller test set.


I can't rule out anything at this stage, but would not expect an SMB  
mount to cause problems which only alter the column choice of data  
items, I'd expect it to create an unreadable datafile by messing up  
things at a much lower level: one or more disk blocks in the file.


Much larger datasets than yours, and with lots of blocked views, have  
been in use for some time.
So my first suspicion goes to the Tcl wrapper (blocked views from Tcl  
have not been used much).


Oh wait - *are* you using Tcl?  I'm jumping to conclusions a bit too  
quickly...


-jcw

_
Metakit mailing list  -  Metakit@equi4.com
http://www.equi4.com/mailman/listinfo/metakit

Re: [Metakit] Mk4py

2005-08-15 Thread Jean-Claude Wippler


On Aug 4, 2005, at 18:37, Brian Kelley wrote:


Yeah the spaces kill me as well sometimes, and then I think that the
spaces are okay sometimes.

The real issue is that a metakit column name can include any printable
character except a comma ,.


Nor [, ], :, and a few more such as parentheses and braces which I'd  
like to reserve for new uses.  Best to stick with alphanumerics only,  
even though MK does not enforce it.  Best also to be consistent in  
the use of upper/lower case.



So, now you know :)

Here is another gotcha for you.  Never, ever delete a column and then
add a column with the same name and a different type.  This will drive
you bananas, I guarantee.

To safely do this, delete the column, write out the db to a new file.
delete the database, repoen it and then add the new column.


The key is the commit - that is the moment when a deleted column  
really goes away.  The new-file/delete/reopen approach is fine too,  
but not strictly necessary.


-jcw

_
Metakit mailing list  -  Metakit@equi4.com
http://www.equi4.com/mailman/listinfo/metakit

Re: [Metakit] trouble installing 2.4.9.4

2005-09-26 Thread Jean-Claude Wippler


Jack Diederich wrote:


I was upgrading from 2.4.9.3 to 2.4.9.4 and I get this error when
I tried to load it and get this error.

sprat:~/src/metakit-2.4.9.4/builds# python
Python 2.4.1 (#2, Mar 30 2005, 21:51:10)
[GCC 3.3.5 (Debian 1:3.3.5-8ubuntu2)] on linux2
Type help, copyright, credits or license for more information.


import metakit


Traceback (most recent call last):
  File stdin, line 1, in ?
  File /usr/lib/python2.4/site-packages/metakit.py, line 22, in ?
from Mk4py import *
ImportError: ./Mk4py.so: undefined symbol:  
_ZTVN10__cxxabiv117__class_type_infoE







Some C++ compiler name munging.  I've been away from C++ so long I  
don't

know how to track this down.


It looks like the .so file links to C++ runtime routines which  
haven't been loaded, presumably because neither Python nor the .so  
have a -lstdc++.


Could it be that the last link step of the .so is gcc i.s.o. g++?

-jcw

_
Metakit mailing list  -  Metakit@equi4.com
http://www.equi4.com/mailman/listinfo/metakit

Re: [Metakit] trouble installing 2.4.9.4

2005-09-26 Thread Jean-Claude Wippler


Jack Diederich wrote:


Traceback (most recent call last):
 File stdin, line 1, in ?
 File /usr/lib/python2.4/site-packages/metakit.py, line 22, in ?
   from Mk4py import *
ImportError: ./Mk4py.so: undefined symbol:
_ZTVN10__cxxabiv117__class_type_infoE

[...]

I changed the Makefile from
SHLIB_LD = gcc -shared
to
SHLIB_LD = g++ -shared

and now it works fine, thanks.  It used g++ for all the other  
compiling

steps but not the final linking.


Ah, that explains it.  I've changed configure.in and configure as well:

Index: unix/configure
===
RCS file: /home/cvs/metakit/unix/configure,v
retrieving revision 1.45
diff -u -r1.45 configure
--- unix/configure  10 Jun 2005 16:02:22 -  1.45
+++ unix/configure  26 Sep 2005 21:45:49 -
@@ -1482,7 +1482,7 @@
if test $SHARED_BUILD = 1; then
   SHLIB_FLAGS=-shared
   SHLIB_CFLAGS=-fPIC
-  SHLIB_LD=gcc -shared
+  SHLIB_LD=g++ -shared
else
   SHLIB_FLAGS=
   SHLIB_CFLAGS=
Index: unix/configure.in
===
RCS file: /home/cvs/metakit/unix/configure.in,v
retrieving revision 1.36
diff -u -r1.36 configure.in
--- unix/configure.in   10 Jun 2005 16:02:22 -  1.36
+++ unix/configure.in   26 Sep 2005 21:45:49 -
@@ -117,7 +117,7 @@
if test $SHARED_BUILD = 1; then
   SHLIB_FLAGS=-shared
   SHLIB_CFLAGS=-fPIC
-  SHLIB_LD=gcc -shared
+  SHLIB_LD=g++ -shared
else
   SHLIB_FLAGS=
   SHLIB_CFLAGS=

This has been checked into CVS and should solve it for good now.

-jcw

_
Metakit mailing list  -  Metakit@equi4.com
http://www.equi4.com/mailman/listinfo/metakit

1 2 >

1 - 100 of 118 matches

Mail list logo