Re: [Metakit] c4_Property shouldn't require a virtual destructor?
Vittorio Digilio wrote: Unfortunately it lacks (almost completely :-) ) a full C++ documentation (anybody, as a long-term user wrote down something and is willing to share, thanks :-) ), so I started experimenting and inspecting the C++ src code. Does http://www.equi4.com/metakit/api/hierarchy.html help? I noticed that c4_Property, though being the base class for the other properties, provides a non-virtual destructor. [...] In this scenario deleting the heap-allocated derived class shouldn't call the base class c4_Property::~c4_Property() destructor and the reference wouldn't be released. I mean : c4_Property *pMyInt=new c4_IntProp(age); // // delete pMyInt; // c4_Property::~c4_Property() should'nt be called // and Refs(-1) isn't called either Perhaps I'm missing here something really big and the destructor should be non-virtual ?! You mean should be virtual? I don't know, but properties are not intended for the heap. Why not simply: c4_IntProp pMyInt (age); -jcw ___ metakit mailing list - [EMAIL PROTECTED] http://www.equi4.com/mailman/listinfo/metakit
Re: [Metakit] mmap
Bruno Blondeau wrote: Could someone tell me how to force changes to the disk when mmap is being used by a Metakit database? MK uses mmap in readonly mode. Changes written to file during a commit are written to the underlying file. The implementation for all I/O is concentrated in the c4_Strategy class, with c4_FileStrategy as the standard implementation of it. -jcw ___ metakit mailing list - [EMAIL PROTECTED] http://www.equi4.com/mailman/listinfo/metakit
Re: [Metakit] small borland change for metakit-2.4.8
Simon Cusack wrote: The new 2.4.8 is great, I had to make a small change for borland to compile it. In src\univ.cpp I had change line 22 from: #if !q4_MSVC !q4_WATC !(q4_MWCW defined(_WIN32)) to : #if !q4_BORC !q4_MSVC !q4_WATC !(q4_MWCW defined(_WIN32)) to build for borland builder 5 and 6. Ok, thx - added to CVS. -jcw ___ metakit mailing list - [EMAIL PROTECTED] http://www.equi4.com/mailman/listinfo/metakit
Re: [Metakit] Case Sensitive Find/Select/Locate
Jeffrey Kay wrote: How hard would it be to add another field type, say 's' (lowercase s) for supporting case sensitive strings? I've sort of run into a brick wall with this -- I have some tables that the case-insensitive search is fine, but others where I really need the case sensitivity. If it were trivial, it would have been in there... But you're right, this needs to be addressed. There may be ways to get us there without globals (I agree, app-wide modality would be a bad choice). One idea is to direct comparisons through the c4_Strategy class. This is per-storage (but one could play tricks, and make the comparison code do different things only for some specified properties). This should be doable with little risk and impact on the rest of MK, nor does it have to cost us in performance IMO. Another option would be to add a comparison member (or function pointer) to each property object (c4_Property, and all its derived classes). Again, no performance cost of substance IMO, but I'm not sure how far the effects of this would reach. Encoding sort choices in type case (s vs S) would not be my favorite, because representation and use are really two different issues. Even using special property names (name_nc:S vs name:S) would be preferable from my perspective, because it keeps this aspect ouf of the MK core. If comparisons are moved to the strategy class, then one could implement this on top of MK - it may well become a default, but at least it would become overridable (for those who need to maintain 100% compatibility). There is indeed no way to go from a view to its parent. This is unfortunate, but impossible to alter in the current design (unattached subviews can be referenced from multiple items). I wouldn't mind contributing the code if you can point me in the right direction or give me a couple of hints about how you think this can be accomplished. Let me think about this. The other reason to push forward on some sort of custom sorting, is that we really need to get Unicode-aware sorting worked out, which can be considered another custom sort order (probably the default one, one day). -jcw ___ metakit mailing list - [EMAIL PROTECTED] http://www.equi4.com/mailman/listinfo/metakit
[Metakit] libtool
I have a question... For some reason I do not quite understand, MK builds shared libs with ld. This completes and works as expected with C++ programs, but it causes runtime errors when loaded from a C main (for example Mk4tcl.so loaded from tclsh). It may also breaks down even with C++ in Unix systems which do not support shared library back-linking. The libtool 1.4.1 info docs sound ominous: Writing libraries for C++ = Creating libraries of C++ code should be a fairly straightforward process, because its object files differ from C ones in only three ways: 1. Because of name mangling, C++ libraries are only usable by the C++ compiler that created them. This decision was made by the designers of C++ in order to protect users from conflicting implementations of features such as constructors, exception handling, and RTTI. 2. On some systems, the C++ compiler must take special actions for the dynamic linker to run dynamic (i.e., run-time) initializers. This means that we should not call `ld' directly to link such libraries, and we should use the C++ compiler instead. 3. C++ compilers will link some Standard C++ library in by default, but libtool does not know which are these libraries, so it cannot even run the inter-library dependence analyzer to check how to link it in. Therefore, running `ld' to link a C++ program or library is deemed to fail. However, running the C++ compiler directly may lead to problems related with inter-library dependencies. The conclusion is that libtool is not ready for general use for C++ libraries. You should avoid any global or static variable initializations that would cause an initializer element is not constant error if you compiled them with a standard C compiler. There are other ways of working around this problem, but they are beyond the scope of this manual. Furthermore, you'd better find out, at configure time, what are the C++ Standard libraries that the C++ compiler will link in by default, and explicitly list them in the link command line. Hopefully, in the future, libtool will be able to do this job by itself. My question is: would anyone have a suggestion how to deal with this in the most portable manner? I tend to use MK mostly in static-linked form, but evidently it would be nice to make this work in the most general way possible. The current CVS sources have a -lstdc++ added to LDFLAGS, which solves it for Linux, but generate the following output on MacOS X (both are gcc 3.1/3.2): *** Warning: This library needs some functionality provided by -lstdc++. *** I have the capability to make that library automatically link in when *** you link to this library. But I can only do this if you have a *** shared version of the library, which you do not appear to have. *** The inter-library dependencies that have been dropped here will be *** automatically added whenever a program is linked with this library *** or is declared to -dlopen it. g++ -dynamiclib -flat_namespace -undefined suppress -o .libs/libmk4tcl.dylib mk4tcl.lo mk4too.lo column.lo custom.lo derived.lo fileio.lo field.lo format.lo handler.lo persist.lo remap.lo std.lo store.lo string.lo table.lo univ.lo view.lo viewx.lo -lc -install_name /usr/local/lib/libmk4tcl.dylib The resulting library does load in tclsh. Should the conclusion be to throw out libtool altogether? Frankly, I wouldn't mind one bit... -jcw ___ metakit mailing list - [EMAIL PROTECTED] http://www.equi4.com/mailman/listinfo/metakit
Re: [Metakit] Question on Views
Barbara Menzel wrote: We are using MetaKit with Visual C++. Often, we find there is a need to initialize or perform some initial action on a view within a new class. I've tried passing the view into the object via the constructor and get several compile errors, primarily, c4_view is not a recognized type. That's a typo, I assume: c4_View, not c4_view, right? However, passing a view as a parameter in a member function, there are no errors and everything works fine. The view can even be updated within the member function and returned with the updates included. Has anyone tried this or something similar with any success? Could you post a brief extract of the code you would like to get working? I'm having trouble understanding exactly what part is not doing what you expect. -jcw ___ metakit mailing list - [EMAIL PROTECTED] http://www.equi4.com/mailman/listinfo/metakit
Re: [Metakit] Q: rowids?
Gordon McMillan wrote: Why sort it? Scan on open for the maxid, and maintain that in memory. It's lower overhead, and doesn't interfere with whatever ordering the app might want to see or maintain. Worth repeating, because it highlights a fundamental aspect of MK's column-wise data storage model. You can open a datafile of any size, point at a view with 10s of thousands of rows of any complexity, and still do the above scan-on-open with no other overhead than one read of a few Kb off the disk. This is to efficient, that any other approach is a waste of effort in most cases. Column-wise data storage means a scan over one property is f a s t. It would be even faster if MK had C-coded loop aggregate functions such as max, but hey - there need to be some goodies saved up for later :) One last point on this. This sort of tight max-scan loop takes maximum advantage of CPU caches (even more so if coded in C). On modern CPU's, that equates to warp drive. -jcw ___ metakit mailing list - [EMAIL PROTECTED] http://www.equi4.com/mailman/listinfo/metakit
Re: [Metakit] Tk with tclkit problem?
Lok Yek Soon wrote: I encounter the following problem when testing Tk with tclkit under Linux (eg. ./tclkit hello.tcl) Error message as follows: == invalid command name wm while executing wm title . Hello (file hello.tcl line 2) Add the following line before calling wm: package require Tk Please post tclkit-related Q's to the starkit mailing list instead: http://www.equi4.com/mailman/listinfo/starkit -jcw ___ metakit mailing list - [EMAIL PROTECTED] http://www.equi4.com/mailman/listinfo/metakit
[Metakit] Metakit 2.4.9 released
There is a new release of Metakit. It's mostly a bug fix release, plus some smaller changes to extend the Tcl binding a bit more on the OO side. Extract from changelog at http://www.equi4.com/metakit/CHANGES: 2003-02-19###MK 2.4.9 2003-02-18Fix bug in blocked view delete and hash byteorder 2003-02-17Configure tweaks for hpux/ia64 2003-02-14Bug found in blocked viewer modification 2003-02-14Some changes to OO interface in Tcl 2003-02-14Enable stdio buffering 2003-02-07Tweaks to restore broken MK ports 2003-02-07Changed code to avoid compiler warning 2003-02-02Work around optimizer bug in gcc 3.2.1 2003-01-24Fixed cleanup order bug in Mk4tcl 2003-01-22Add missing -lstdc++ 2003-01-19Tweak to temp object use 2003-01-17Add synonym for mk4tcl info command 2003-01-16Allow access to root view in Mk4tcl 2003-01-15Use strdup 2003-01-10Build improvements, Mk4py long and Mac improvements 2003-01-09String compare tweak, Mac Carbon runtime mmap code 2002-12-23Tweak for Borland builder 5 6 2002-12-09Fixed bug in selection view change propagation 2002-12-02Fixed bug in MK old-file format conversion 2002-11-24Fixed Mk4tcl threaded build 2002-11-22Configure tweak for HPUX/Itanium 2002-11-16Tweaks to compile on Mac 2002-11-04Fixed typo in Makefile 2002-11-03###MK 2.4.8 The homepage is, as before, http://www.equi4.com/metakit - FYI, Andreas Kupries has documented the 2.3/2.4 file format, see doc section. FYI also, there is now a bug tracking system at http://www.equi4.com/bugs The 2.4.9 release passes all its 140+ tests on Windows, Linux, and MacOS X - but there probably remain some portability issues in the makefiles and headers. Metakit is Open Source Software, and will always remain so. If you would like to support this, please share bug details, tricks/insights, and porting tips - it's a very effective way to help take it yet further. Happy programming :) -jcw ___ metakit mailing list - [EMAIL PROTECTED] http://www.equi4.com/mailman/listinfo/metakit
Re: [Metakit] MetaKit java bindings?
Michael Scharf wrote: I switched my project from delphi/python to java. And I still beleive that MetaKit is the best database for my application. Now I wonder if anybody has written some JNI code to access MetaKit from java. Whee... a trans-lingual MK afeccionado! :) It depends on what you want to do. An experiment which was done about a year ago (by Christian Tismer), was to generate a *fixed* binding, given an existing database structure. Changes to the schema means you have to regenerate and recompile the wrapper. The reason to mention this, is that the binding is C, not C++ (though inside, is of course still a C++ core for now). That means SWIG should have no trouble at all wrapping it to various languages, including Java. This was in fact one reason for doing it. As far as I can remember it was definitely functional code, though not exposing most of the MK view operators, just basic access/modify functionality for views and rows. The project was shelved, to await better focus and actual need. In case you're interested - it's all available in a CVS project on equi4.com (follow same checkout instructions as metakit, but use metable as module name). It would be fantastic if you can make such a binding work for Java, in some form or other (metable is just food for thought, there are of course many more ways to go about this). It also matches my conviction that data storage has longer lifetimes than languages, i.e. over time I expect an increasing interest to bind to a second language (as teams and projects evolve). It's been some time since I mentioned it, so let me re-iterate that I continue to be interested in making more language bindings happen, also to Perl, Ruby, etc. My main hurdle is not knowing enough about each of the respective languages to be able to do things in a natural way myself. But I'll definitely do my best to help where I can. -jcw ___ metakit mailing list - [EMAIL PROTECTED] http://www.equi4.com/mailman/listinfo/metakit
Re: [Metakit] Redundant data
Angus Lord wrote: I'm using metakit partly as structured storage for my app. Setting up the database is fine, however if I modify one of the entries, then I get a copy of the data stored in the file (only once, I can modify it many times). This is fine for small bits of data, but if I am storing and modifying files (a few kb) then I am potentially wasting a lot of space. Is there any way to turn this feature off? You're seeing the consequences of stable storage - the mechanism which ensures commit/rollback robustness. Comes with Metakit, which is a database manager that will continue to function with a consistent dataset regardless of aborts and pulling plugs at the most awkward times. You can compress, by saving to a new file and switching over to it (see SaveTo). The space is not wasted or lost. It gets re-used later. -jcw ___ metakit mailing list - [EMAIL PROTECTED] http://www.equi4.com/mailman/listinfo/metakit
Re: [Metakit] Installing Mk4tcl on Red Hat Linux 8.0
John Fletcher wrote: Set $auto_path accordingly. What does this do? Standard Tcl - please google for it or look at man pages. Is there any way in which the installation can do the building of the package index file? If not, then the demos and tests, which use package require will all fail, unless instructions for how to get round this are given as well. Yes, the pkgIndex.tcl file gets created by make install, see the makefile for how/where it does that. The trouble right now is that something is broken in the 2.4.9.2 autoconf/libtool config (help!). I probably should not have followed someone's recent advice (in private email) to update to a newer autoconf/libtool combo :( Alternatively, there could be some tests which use load instead. $ cd ../tcl/test $ tclsh all.tcl Processing 9 scripts... mk1basic mk2chan mk3struct mk4commit mk5object mk6fixed mk7limit mk8fail mk9crash Passed 33 tests $ Incidentally, I did a build of 2.4.9.2 and I noticed that the libmk4tcl.so and Mk4tcl.so files are not the same size, so it is not just a case of renaming, as was said in the readme for 2.4.7. I eventually found the built libaries in the hidden folder builds/.lib $ ls -l .libs/libmk4tcl.so -rwxr-xr-x1 jcw users 1189052 May 29 16:09 .libs/libmk4tcl.so $ ls -l Mk4tcl.so -rwxr-xr-x1 jcw users 340468 May 29 16:09 Mk4tcl.so $ cp -a .libs/libmk4tcl.so blah $ strip blah $ ls -l blah -rwxr-xr-x1 jcw users 340468 May 29 16:25 blah $ -jcw ___ metakit mailing list - [EMAIL PROTECTED] http://www.equi4.com/mailman/listinfo/metakit
Re: [Metakit] corrupt database
Guenther Fischer wrote: On Tue, 24 Jun 2003, Jacob Levy wrote: You (and we) need a little more information: [...] It is a starpack with the latest version of tclkit - the windows version is build on linux. tclkit is the build fron equi4.com. For issues regarding starpacks and tclkit, it is probably more effective to post to the starkit mailing list, see: http://www.equi4.com/mailman/listinfo/starkit The error comes only with this one DB created with my application (a wine databes programm). There are many other users - I never see it before. I user tclkit/metakit for some years for this project (free software). I think there are some bad data in the database (disk error or what ever) and this data are needed for indexing or so. The one unexplained problem on Windows, and it might even be a regression from previous releases, is a reported corruption when the datafile is on a file server. So this should definitely be something to find out. Every other case I know of was caused by opening more than once. The bad news: datafile corruption in MK tends to damage real bad. It usually does not damage records but entire *columns*. So when trying to extract data, your best bet is to try to not extract all properties. Quick things to try: - does sdx mkinfo datafile give meaningful details? - also see the mk2tcl starkit on http://mini.net/sdarchive/ - sometimes readkit.tcl can read what MK itself cannot -jcw ___ metakit mailing list - [EMAIL PROTECTED] http://www.equi4.com/mailman/listinfo/metakit
Re: [Metakit] Using head of file
David Van Maren wrote: I noticed that constructing a (modifiable) storage from an existing non-metakit file succeeds, and allows metakit information to be written to it with no errors. Examining the file afterwards, I've seen that the metakit information is simply appended to the original file data, leaving it intact. I looked at some of the format documentation, and it indicated that there is both a header and a footer used by metakit. I'm guessing that the above behavior was by design, in order to allow users to use the head of the file for non-metakit information (such as their own magic number or similar information). But that's just a guess, so I've got a few questions: 1. Is this behavior intentional? Yes. It's used in the Tcl scripting language to implement Starkits and Starpacks: scripts and executables which can be launched and contain a MK datafile, piggy-back style. In the case of Starkits, the header is a regular Tcl script (Tcl stops reading at a CTRL/Z in the file). 2. If not, shouldn't metakit fail construction of a Storage from an existing non-metakit file? I agree that this append behavior can confuse things. One way to check is to open read-only, and check the description string of the storage contents. I should come out as empty. 3a. If it is intentional, does metakit guarantee that it will leave the head of the file unchanged, even through a commit() which changes the metakit contents? Yes. 3b. Does metakit care if I subsequently modify the head of the file (after closing the Storage associated with it)? No. No need to close - MK ignores header data. 3c. Does metakit care if the head of the file is grown or shrunk so long as the Storage is closed? No. The tail markers use relative sizes - here, closed is essential. We're wanting to mark each storage file with our own special magic number, and this looks like a very easy way to do it, if metakit supports it. Yep. -jcw ___ metakit mailing list - [EMAIL PROTECTED] http://www.equi4.com/mailman/listinfo/metakit
Re: [Metakit] How long to wait before commiting?
Brian Kelley wrote: I am loading a database with a bunch of new data that the user is allowed to validate. I have been using commit() and rollback() for these operations because it's easy :) The question I have is, what are the ramifications of loading a lot of data without commiting? Memory? Speed? Inquiring minds want to know. Memory usage grows until the commit, as more and more view changes are buffered. For classical data entry, i.e. typing, I would assume that speed is never an issue, nor are the amounts of data, i.e. memory usage... -jcw ___ metakit mailing list - [EMAIL PROTECTED] http://www.equi4.com/mailman/listinfo/metakit
Re: [Metakit] Do loaded rows hang around in memory?
Erik Hermansen wrote: [trouble] The only pattern I can see is that the corruption usually occurs when the user exits the application. If you have AutoCommit() enabled, then that may be related - it'll commit before exit. Can you copy away the file The commit buffers are also memory-mapped files? In which case they couldn't be corrupted by stray writes from my application code, right? No - commit buffers are allocated. They, and administrative info, can still be corrupted. To rule this out requires running in a separate address space, i.e. process. Is there the possibility of exiting before writes performed in a commit are finished? The question sounds dumb to me, but I am grasping at straws because I've already tried so many things. If writes do not finish, the last step of commit is not done, i.e. the file will not be adjusted to use the new state. Any premature exit/end/crash leaves original state intact. I'm saying this under the assumption that there is no bug in MK. If there is, I hope we can find it and resolve it ASAP! The only other hint I have is that the bug was never reported until I split my database into three separate storage files. Three different files, c4_Strategy objects, and c4_Storage objects? Should be no problem. There is no longer 100% consistency between the three, i.e. you may see one commit succeed and another fail (e.g. disk full). But none of this can damage datafiles IMO. Another programmer is playing around with the order we perform commits and delete storage objects during the application exit, but each tweak takes about a week to confirm whether it did anything or not. This is the final release-stopping bug after a year of development. Is it an idea to record play back the changes to force the problem to the surface? A drastic method would be to instrument all modifying calls to write out a set of instructions, perhaps as a Python or Tcl script. Are you using threads? Are you 100% confident of the stability of the compiler and runtime library? I'm old enough to know not to point fingers at anyone but myself, but it doesn't hurt to rule out the obvious one more time... -jcw ___ metakit mailing list - [EMAIL PROTECTED] http://www.equi4.com/mailman/listinfo/metakit
Re: [Metakit] hash table size in database
Kristian G. Kvilekval wrote: I am using a hash view on a table of about 4K elements. Today I examined the database with the dump utility and noticed that the hash view sometimes has 4K elements and other times it has 8K.. What determines the size of the hash table? It's a power of two, and it's always larger than the number of data rows. I've written a bit more about hashed and blocked views on this new page: http://www.equi4.com/mkmapping.html -jcw ___ metakit mailing list - [EMAIL PROTECTED] http://www.equi4.com/mailman/listinfo/metakit
Re: [Metakit] hash table size in database
Kristian G. Kvilekval wrote: Hmm... that's exactly what prompted the question. I have a database with 4030 entries, but one machine it generates a hash with 4096 and the other with 8192.. Is it checking whether the database fits in memory? Just infinitely curious: Machine 1: 512KB ram Database Sz: 673725 mk4dump ~/.zinf/db/metadb | fgrep VIEW VIEW 1 rows = dbview:V dbview_H1:V VIEW 4030 rows = url:S type:S title:S artist:S album:S genre:S comment:S track:S year:S VIEW 4097 rows = _H:I _R:I -- Machine 2: 2GB ram Database sz : 689683 VIEW 1 rows = dbview:V dbview_H1:V VIEW 4030 rows = url:S type:S title:S artist:S album:S genre:S comment:S track:S year:S VIEW 8193 rows = _H:I _R:I No, fits in memory is not considered. What does matter is the order or adds/deletes. Space is reclaimed and re-used, when fill drops too low - there is hysteresis, i.e. the same number of rows can have a different hash table size depending on how entries were added and deleted. Same build order, different platform? (if so, there could be a bug) -jcw ___ metakit mailing list - [EMAIL PROTECTED] http://www.equi4.com/mailman/listinfo/metakit
Re: [Metakit] Mk4py behavioral questions [was Re: Mk4Py bug?]
Gordon McMillan wrote: On 17 Sep 2003 at 1:25, Nicholas Riley wrote: [...] I'm working under the assumption that, given a database, you'd prefer it to generate an error rather than discard data. Not at all. When working with GUI forms, I often use a form dict which may well have extra state I don't want persisted. Temporary properties, is what I usually call 'em... Yes, they are in fact extremely useful - you can have a view with props a, b, c - then open it with getas a,b,d, then have c still linger around, copy c's to d's, say to convert, then commit. The result is a view with a,b,d. Properties which are restructured away like this, and properties not in the getas are temp props - they disappear on commit (and rollback). I think you can even add a prop, and do a getas after-the-fact to make it persist. Also, properties which do not persist offer a way to cache additional info for each row. -jcw ___ metakit mailing list - [EMAIL PROTECTED] http://www.equi4.com/mailman/listinfo/metakit
Re: [Metakit] beginners hash view
Riccardo Cohen wrote: I built a view of 50k records that I need to access. With normal view it is too slow (180 ms), so I try with hash view that I never used, and the result is even slower !(250 ms) ! (I've just read the new page http://www.equi4.com/mkmapping.html) Here is what I've done : c4_View view=db.GetAs(table[key:S,val:S]),selection; c4_View viewsec=db.GetAs(sec[_H:I,_R:I]); c4_View viewhash=view.Hash(viewsec,1); c4_Rowrow,searchrow; c4_StringProp val(val); c4_StringProp key(key); for (idx=0;idxTOTAL;idx++) { sprintf(st1,%d%d%d%d,idx,idx,idx,idx); sprintf(st2,%d%d%d%d,idx,idx,idx,idx); key(row)=st1; val(row)=st2; view.Add(row); No! Do not touch view when there's a hash mapping on top. Use viewhash: viewhash.Add(row) } db.Commit(); key(searchrow)=; selection=viewhash.Select(searchrow); what's wrong ?? is there any sample code ? What are you measuring - total time of the above code? Yes, that will be slower, it is now also setting up hashes along the way. But the select should be instant. One way to avoid the above confusion, it to rewrite your code a bit more and use: view=view.Hash(viewsec,1); at the top, replacing the viewhash declaration. In other words, hide the original view once you've set up a hash on it. -jcw ___ metakit mailing list - [EMAIL PROTECTED] http://www.equi4.com/mailman/listinfo/metakit
Re: [Metakit] beginners hash view
Riccardo Cohen wrote: Thanks for your quick answer. I did try to use only viewhash, for adding. But it did not change. Weird... try using Find() instead of Select()? -jcw ___ metakit mailing list - [EMAIL PROTECTED] http://www.equi4.com/mailman/listinfo/metakit
[Metakit] Trends (was: Re: beginners hash view)
Riccardo Cohen wrote: c4_View view=db.GetAs(table[key:S,val:S]),selection; c4_View viewsec=db.GetAs(sec[_H:I,_R:I]); c4_View viewhash=view.Hash(viewsec,1); [...] what's wrong ?? is there any sample code ? To follow up on this - the demo/ and examples/ subdirs in the MK source distribution have some sample code, for C++, Python, and Tcl. For C++, there are 140+ little self-contained tests in the tests/ regression test suite. They may not be perfect examples, but they are very small and self-contained, and definitely a good spot to look for uses of all of the different view operators. I don't want to discourage people. On the contrary. But I'm juggling time between a number of activities. I've been working on some Really Exciting Technology To Take Metakit Way Further (TM) for some time now. So my efforts to help and support and improve docs are going to be limited, while maintaining my long-term commitment to help resolve and fix bugs. As you may have seen, the www.equi4.com website has recently gotten an overhaul, in an attempt to make things easier to find. I've started writing up some more pages in response to questions on this list. And I've just finished a basic utility to display low-level stats and verify free-space integrity of MK datafiles, see http://www.equi4.com/mkstats.html A number of people have sent a donation lately (thank you!), and Apple Computer has recently rewarded the fact that MK is doing well for them in every release of MacOSX all the way to Panther by donating a 17 Powerbook (whee!), so I can't even start to tell you how motivated I am to take the revolution of Metakit further. I'm saying this to let you know that although a mailing list like this is usually about Q's and problems, there really are many things going pretty well these days. The one constraint seems to be my time (and a more fanatic focus). If you want to help, consider writing a small piece about some aspect of Metakit - as a good deed on some rainy day, perhaps. You can either do it all yourself, put it on your website and announce it so I can point to it, or enter it as a page in the MK wiki at http://www.equi4.com/metakit/wiki.cgi - or you can email me and I'll go out of my way to set up a new page and integrate it with what's on the website already (with full credits and acknowledgement). The other way to help, and it's really encouraging to see it happen more and more, is to participate and help out with questions on this mailing list. Happy coding, may Metakit serve everyone really well :) -jcw ___ metakit mailing list - [EMAIL PROTECTED] http://www.equi4.com/mailman/listinfo/metakit
Re: [Metakit] beginners hash view
Riccardo Cohen wrote: It worked fine with Find() ;) Then it comes two questions : 1) Is it normal that select does not use hash ? 2) If I do a SelectRange(), will it use hash ? It seems you've just answered your own Q's. Think about it: hashing is by value, not sorted. So select, which uses selectrange to optimize, will not be able to use it - I haven't checked the source, but when you do I'm pretty sure that's what you'll find out. -jcw ___ metakit mailing list - [EMAIL PROTECTED] http://www.equi4.com/mailman/listinfo/metakit
Re: [Metakit] Ordered and Hash Views together!
Brian Kelley wrote: I have a database table table[id:I,name:S] that I would like to find quickly using either id or name. Is it possible to have two hash views simultaneously on this table? This Q was bound to come up one day ;) vw = st.getas(table[id:I,name:S]) dvw = vw.project(vw.name, vw.id) _hashview = st.getas(table_hash[_H:I,_R:I]) _hashview2 = st.getas(table2_hash[_H:I,_R:I]) vw2 = vw.hash(_hashview, 1) dvw2 = dvw.hash(_hashview2, 1) I use vw2 to quickly find the id properties and dvw2 to quickly find the name property. It appears to work (the full test is below) which is pretty amazing. I can add data to vw2 and dvw2 is automagically updated. Is this safe and proper? I am not sure. I suspect dvw2 being updated is not quite right - try changing en existing item in vw2 so its key stays the same, but the name is altered... (my hunch is that dvw2 triggers a full rehash upon seeing a size change in its underlying data view) The trick with hash views is that they must see all changes, so they can update the secondary info, while essentially passing through the request to the underlying data view. You might have to change the above so one hash is built on top of the other: vw = st.getas(table[id:I,name:S]) vw2 = vw.hash(_hashview, 1) dvw = vw2.project(vw.name, vw.id) dvw2 = dvw.hash(_hashview2, 1) And then always make changes *only* through dvw2, which will cascade changes through vw2. Do tread with caution - these are cases where I have not done any testing at all... there is definitely considerable room for optimization (internally) in this sort of stacked use which is *not* being done at all in the current implementation. -jcw ___ metakit mailing list - [EMAIL PROTECTED] http://www.equi4.com/mailman/listinfo/metakit
Re: [Metakit] indexed view
Riccardo Cohen wrote: While looking at indexed view in view.cpp, I read the header of hash view, and noticed a small mistake : [...] * c4_View datah = storage.GetAs(people_H1[name:S,age:I]); [...] while the text above speaks about the secondary view [_H:I,_R:I] (Must be an old version) Good catch, will be fixed in next checkin. Thx. About indexed view, it is written : * This view is modifiable. Careful: when a row is changed in such a way * that its key is the same as in another row, that other row will be * deleted from the view. So this supposes that multiple key is supported for adding, but not for updating. Is that true ? The catch is that modifying a key property is not unlike deleting a row and adding another one. That leads to some hairy details which I have not even thoguth through well enough... By the way, there is no c++ example of indexed view. I cant find what to put in arguments const c4_View map_ and const c4_View props_ I'm not sure. Indexed views are experimental at best, right now. I'm not too pleased with what's there, and would suggest staying away from it - it's not ready for real use IMO. Even ordered views have some blind spots - it's all due to an unfinished design of how implicit key ordering and explicit row# indexing should work together. And there's things like allowing duplicate keys or not, and most important of all: all ordering is going to be somewhat limited until MK supports custom comparisons (for UTF-8, case-sensitivity on/off, reverse ordering, etc). You're reaching limits of the current implementation. Some of this is simply unfinished, but some of it hinges on deeper issues which are taking a lot more time to understand and resolve than I originally assumed. The view model is very generic, the inherent ordering side of things needs more details about views to be managed before all operators can be done properly. Note that you can always take over, and derive a new custom viewer class for a certain purpose. What a custom viewer lets you do is intercept all access and changes, and manage extra details in secondary view, etc. I should probably document more of that generic (and *very* powerful) mechanism, rather than try to list all rough spots in indexed viewers and such. Custom viewers are the foundation for all the newer view operators in MK, including hashes, blocking, joins, groupby, remap. -jcw ___ metakit mailing list - [EMAIL PROTECTED] http://www.equi4.com/mailman/listinfo/metakit
[Metakit] Bugs, gaps, and suggestions
There's a new page on the website which I'm going to use to collect issues which do not fit the bug tracking system well enough, as well as more open-ended ideas and suggestions for extending and improving Metakit. The new feature/to-do page is at http://www.equi4.com/mktodo.html The bug tracking system is as before at http://www.equi4.com/bugs The list is fresh and totally incomplete. Please send suggestions and reminders. I'd like to prevent good ones from falling through the cracks, and also maintain a good overview so important things can be done first. -jcw ___ metakit mailing list - [EMAIL PROTECTED] http://www.equi4.com/mailman/listinfo/metakit
Re: [Metakit] metakit for python 2.3 on freebsd
PieterB wrote: I'm trying to install metakit with python 2.3 on freebsd 5.1. When I run make METAKIT_WITH_PYTHON=yes from /usr/ports/databases/metakit (and changing python2.2 to python2.3 in the Makefile). Not sure how the FreeBSD ports system is setup with MK, so I can't comment on it. In general, make's for the python side of MK could use some tweaks, judging from a couple of recent posts. Here's perhaps an option for you: if you checkout from CVS, you'll find a new distutils solution added by Gordon McMillan: cd python python setup.py build (or install) Now that distutils is supported, being the Python way nowadays, should we perhaps move away from makes and all the autoconf/libtool complexities? When in Rome, etc... -jcw ___ metakit mailing list - [EMAIL PROTECTED] http://www.equi4.com/mailman/listinfo/metakit
[Metakit] gcc 3.3 on mac
FYI, the MK build problems with gcc 3.3 on OS X are resolved by getting the latest gcc update from Apple (Aug 2003): This one is bonkers: $ gcc --version gcc (GCC) 3.3 20030304 (Apple Computer, Inc. build 1435) Copyright (C) 2002 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. This one is sane: $ gcc --version gcc (GCC) 3.3 20030304 (Apple Computer, Inc. build 1493) Copyright (C) 2002 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. No changes to MK. Builds without warnings. Test suite passes cleanly. -jcw PS. Unrelated, but FYI: I've regenerated autoconf 2.57 / libtool 1.4.3, see CVS. ___ metakit mailing list - [EMAIL PROTECTED] http://www.equi4.com/mailman/listinfo/metakit
[Metakit] test - please ignore
(I'm fiddling with Mailman mailing list settings) -jcw ___ metakit mailing list - [EMAIL PROTECTED] http://www.equi4.com/mailman/listinfo/metakit
Re: [Metakit] S vs B datatypes in Python
[EMAIL PROTECTED] wrote: Is there in fact, for Python, a difference between using S and using B? There are some semantic differences between C/C++, Python, and Tcl - so there are always going to be some slight impedance mismatches between them for Metakit. In C, S-properties are zero-terminated strings, while B's are (sized) byte buffers. In Python and Tcl, the distinction is far smaller. The sort order of S's is done with stricmp (case insensitive textual comparison), while it is memcmp for B's. If your data may contain null bytes, you must use B's. If your data is text, use S's to get a decent sort order (Unicode and UTF-8 issues will also play a role for S's one day). -jcw ___ metakit mailing list - [EMAIL PROTECTED] http://www.equi4.com/mailman/listinfo/metakit
Re: [Metakit] Mk4py storage lifetime
Nicholas Riley wrote: I'm almost finished with my Mk4py work, finally. Down from about 20 items on my to-do list to two, at least. :-) Wow, great! Anyway, one more question. Should this work? metakit.storage().getas('blah[x:S,y:I]').structure() [] Because this does: s = metakit.storage() s.getas('blah[x:S,y:I]').structure() [Property('S', 'x'), Property('I', 'y')] This is correct behavior in the current design. Storages are not kept open by views. Their cleanup causes all views associated with them to become empty. Should access to orphaned views throw an exception? That would indeed be another way to treat this (and probably useful to help detect incorrect use), but it would require some redesign of the C++ core. -jcw ___ metakit mailing list - [EMAIL PROTECTED] http://www.equi4.com/mailman/listinfo/metakit
Re: [Metakit] Mk4py changes
Nicholas, Thanks for this great contribution. I'll go through this to understand all the changes. In the meantime. I'm attaching a diff with the latest code in CVS, it wasn't so hard after all, and after editing out the differences reported for auto-generated stuff such as configure, it becomes a lot easier to read through all your changes. [...] That's it. I hope these changes will make it easier for people new to Mk4py to get up and running, instead of being mired in compilation and usage problems. Thanks again - I *very* much share your concerns and appreciate your efforts to improve that side of things. Deployment hassles are the worst time-wasters ever IMO, it's worth spending all our time on (but only a few people, that is), to get better out-of-the-box solutions! -jcw njrdiffs.out.gz Description: GNU Zip compressed data
Re: [Metakit] find select and search, help once again please
Riccardo Cohen wrote: Search() = many errors like : error search for key '836 my key 836' found idx 15000 this key is in the table, I see it with kitviewer, 3 records have this key. = when found, the value is sometimes not the good one : idx=498, key='195 my key 195', foundidx=5850, val='value for key 1950 [dum=5850]', avg=0.188377 it should be the value for key 195, not for key 1950 ! = If I increase the TOTAL to 1 instead of 5000, then every search is found with no error ! = the search is very quick, but does not provide the result ! what does it do exactly ??? Binary search. It can only work if the view is sorted on the key, and the key is the first property. Find() == result ok, but quite slow. If I hash my table, the result is very quick, but I cant have multiple key ! (which is a problem for me) You could consider grouping first on the (non-unique) key, and then hashing the resulting view/subview structure? I dont need the performance of a Cray II running an Oracle Server, but 62ms per selection is too much for me (it takes 6 seconds for 100 searches !). Could anybody help me please ? Have you tried a plain brute force loop? If it still isn't near the performance you'd expect, please post a (short) code example. There are a few things to avoid unnecessary copying. I don't think 62ms to go through 10k rows or so is accurate, there must be something else going on... -jcw ___ metakit mailing list - [EMAIL PROTECTED] http://www.equi4.com/mailman/listinfo/metakit
Re: [Metakit] Mysteriously growing Metakit db
[EMAIL PROTECTED] wrote: [...] I am now fairly confident that my database will not grow without bound, consuming life as we know it on the east coast. Relieved :) But what is commit-extend mode anyway? I've dug up information about this in the Metakit wiki and used it as basis for a new page on the website, see the last item of http://www.equi4.com/mkdocs.html -jcw ___ metakit mailing list - [EMAIL PROTECTED] http://www.equi4.com/mailman/listinfo/metakit
Re: [Metakit] Mk4Py for python2.3 on Windows
Thorsten Henninger wrote: I am trying to compile the Mk4py.dll (python bindings) for Python2.3 on Windows, but I did not succeed. I almost got it withthe mingw cross compiler on windows, but this one does not work as well! There was an issue with 64 bit ints (PWONumber.h and PyRowRef.cpp, addressed by njr's patch), but after that change - and the inevitable adjustment of python22-python23 in the MSVC6 build project - it builds ok with MSVC6. Mk4py.dll /pub/mk/mk-2.4.9-windows/Mk4py.dll I've renamed the previous build to Mk4py22.dll, and have uploaded a new Mk4py.dll - enjoy :) -jcw ___ metakit mailing list - [EMAIL PROTECTED] http://www.equi4.com/mailman/listinfo/metakit
Re: [Metakit] Mk4py changes
Mikhail, did you change any files outside of mk4py distribution? FYI, see http://trixie.triqs.com/pipermail/metakit/2003-September/001409.html - I posted a patch, i.e. all the differences in one file. You can see exactly what Nicholas did. -jcw ___ metakit mailing list - [EMAIL PROTECTED] http://www.equi4.com/mailman/listinfo/metakit
Re: [Metakit] BSDDB vs. Metakit performance?
Brian Kelley wrote: Let's do the test: 2.9143433 seconds to iterate bsddb3 1.8621608 seconds to iterate metakit So metakit is approximately 30% faster for linear access. Both are pretty good though. As you know, statistics can be made to come out any way you like, i.e. your above figures could also be summarized as: bsddb3 needs 56% more time than MK. [jyl] You didnt say which OS you used. Do you happen to know if Metakit uses memory mapped files on your OS? If it does, that's why loading is slower -- Metakit has to obtain committed address space pages from the OS to map all that data into the process's address space. Keep in mind that this is a one-time mmap(). Pointer access and page faults do the rest. OS'es are pretty good at that, their entire code-loading and I/O designs are based on it, usually. Modern OS'es detect sequential accesses even and start pre-fetching. I just know from experience that bsddb scales up to gigabyte files and metakit claims to have good performance to the several hundred megabyte region. I haven't found much of a problem in practice, some of my metakit files are 600MB+ The hard limit is memory mapped address space, i.e. well under 2 Gb on 32-bit machines, in practice. I'd like to point out that storing blobs in MK is actually very efficient. Surprising at it may sound, above a certain size and number of rows, storing N items in a view will do just about the same as BDB does. Keep in mind that MK is columnar - the presence of a column, no matter how big or complex, does not affect traversal of the others. If it's all opaque binary data, and a substantial percentage is empty, then using a separate view should work out better. Personally, I think that unless file size is too constraining, you should just add the extra property to your view and let MK's adaptiveness figure out how to store things for that particular column (what MK does is switch to non-columnar storage if #bytes or #rows grows too far - there's a heuristic involved to find a decent trade-off). If you really want to get to the bottom om this, you should compare the mix of having data in MK for traversal, and either items in BDB for large storage or adding a single view with one S or B property, and storing items in that view. The logic of this should be similar. I suspect that MK will come out at least as fast (you're retrieving the N'th item from a view, vs. BDB doing an extra - albeit simple - hash lookup). You'd gain single-file convenience, and less installation dependencies. But you'll need to stay under say 1 to 1.5 Gb. Note that there is a downside in the current implementation: MK determines free space from a full traversal, so having millions of pieces in the file wil slow it down as it starts preparing for a commit. The file format has some unused features to greatly avoid such traversals, but MK 2.4.9.2 does not yet take advantage of that. It will definitely have to, before we can grow it to the terabyte range in 64-bit architectures. -jcw ___ metakit mailing list - [EMAIL PROTECTED] http://www.equi4.com/mailman/listinfo/metakit
[Metakit] compression and encryption
Two options not implemented in MK 2.4.9.2, are compression and encryption. Due to the column-wise design of MK, this may actually have substantial consequences. The idea, is that in a datafile with say layout names[first:S,last:S,phones[type:S,number:S]] it would be possible to designate some properties as being compressed, others as encrypted, and yet others as both. The compression would take place in a column-wise manner, i.e. all values of the designated property in all rows would be compressed. If there is major redundancy/repetition, then the storage size would be greatly reduced. On first access, such columns would be uncompressed (taking up some memory), and on commit, the data would be saved compressed again. For compression, it would seem that zlib is the de-facto standard to use. For encryption, a similar effect would be seen: on file, the entire column becomes encrypted, again for a specific property of all rows in that view. A complicating factor would be that encryption needs to be cusomizable, so in this case a callback through the c4_Strategy class seems the right way to do it. Perhaps some basic encryption such as David Wheeler's TEA could be included as default. When combining compression and encryption, compression would have to be done first, to have any effect. The encrypted result of that would be stored on file. On reading, the data must first be decrypted, then decompressed. This also requires a change to somehow specify the details in the description string, or in some other way. This will require more thought. I'm bringing this up because I am regularly compressing data before storing it (and Starkits in Tcl do it all the time), and because I think there is value in getting the encryption covered, especially since it could be done on a per-property basis. Encryption could be useful to lock up applications deployed as starkits/starpacks. The file format has hooks to allow this sort of thing, although such files will not be readable by current MK releases (they would not know how to skip over the extra admin info). Apart from yes please, or no thanks, do you consider this a valuable option? Would you need it and use it right away? Desperate enough to fund it? (I had to ask...) Any ideas about how to encrypt? Or maybe only do compress? Are there any implications / trade-offs I'm forgetting about? -jcw, with a small marketing hat on... ___ metakit mailing list - [EMAIL PROTECTED] http://www.equi4.com/mailman/listinfo/metakit
Re: [Metakit] compression and encryption
Andreas Muegge wrote: the implementation of encryption is something I would like to see Ok, thanks for letting me know (surprising: just two responses so far). I am not sure about compression. If we talk about normal strings I guess you must try it out. Decompression is usually rather fast and waiting 2 seconds at program start shouldn't bother the user. Would it still be possible to run Metakit on a readonly medium? Yes. Decompression would be an in-memory thing. For big data (1k and more) I have serious doubts. You would have to decompress several MBytes before the first access is allowed [...] No, the decompression would happen per column, on first access, and you can pick per column which one is stored compressed and which one isn't. For large strings, compression would not be per column even, but per item. The switchover point is hard to define, MK uses an adaptive heuristic to choose between ways to store strings. Of course I can only compress each record and not the whole column. Yes, per-item compression (BTW, it's not record but item, i.e. property) is always possible of course, at the MK caller level. The encryption/compression I'm talking about would be column-wise, i.e. very effective with views in which a property has the same value across many rows. For an impression of the effectiveness on your data, create a datafile from scratch so it has no free space, and gzip it - the results could be normal (i.e. 10..30% reduction) or dramatic (i.e. 90% reduction), depending on the nature of your data. I once converted a 120 Mb database stored in another format, and ended up with 10 Mb in MK, which them went to 900 Kb when gzipped... -jcw ___ metakit mailing list - [EMAIL PROTECTED] http://www.equi4.com/mailman/listinfo/metakit
Re: [Metakit] KitBinder
Pascal Baspeyras wrote: I'm interested in KitBinder features, but I can't find more than this page: http://www.equi4.com/metakit/api-old/doc_kbind.html I easily embed a metakit file into my app's resources, but I fail to open it from there (Visual C++). Whoa ... that's 5 year old technology, it's truly ancient! ;) With today's MK, all you need is to append the MK datafile to the end of your executable. Then open the executable (read-only) and you'll have a datafile... this is cross-platform. The trick is to find out the path of the file to open. In Win32, use GetModuleFileName. -jcw ___ metakit mailing list - [EMAIL PROTECTED] http://www.equi4.com/mailman/listinfo/metakit
Re: [Metakit] Viewing a MK database
[EMAIL PROTECTED] wrote: Let's say I have a large Mk database that I want to display in a grid-like format (wxGrid, for example). What is the best way to approach this so that the display is very fast? [...] But these *represent* other things and what I want to display is the string representations corresponding to (at least some of) the ints. This requires accessing other databases to retrieve or compute the appropriate string representation. It is then this resulting string-ized row that needs to be displayed in the grid. Quick (not well thought-through) response: This is where mk.wrap() can probably help. Define a view which wraps the MK view, and produces values in the way you need them. Then, access to items will go through your Python wrapper *on-demand*. The whole trick of KitView.exe (and I presume Brian's KitViewer) is delayed rendering. Scrolling across an infinite number of rows can be instant. The issue is not doing more work than needed - and if you think about it, KitView does exactly the same as what you're after: take data out of MK, and present it in transformed way on the screen (namely visually). Back in the times when KitView.exe was built, I had only Borland C++ Builder's datagrid which was up to this task. Many simpler GUI approaches ask you to fill a matrix or listbox before using them. Nowadays, there are several widgets in Tk (TkTable as well as pure tcl ones such as Hugelist) which can play this on-demand delayed-rendering trick. I don't know wxPython (or wxWindows) unfortunately - but it can no doubt also do this. With Mk4py's wrap() you can create MK views which are virtual *and* which go through arbitrary Python code on each item access request. It's sort of the equivalent of MK's c4_CustomViewer class in C++, which is extremely powerful (many view operators are built on it). -jcw ___ metakit mailing list - [EMAIL PROTECTED] http://www.equi4.com/mailman/listinfo/metakit
Re: [Metakit] Question on how to remove top level views
Berk, Murat wrote: We used to store a view name in one view (instead of marking cell as subview) so that we can do getAs(name) and use it. I do not want to change a lot of things, but when we try to delete the rows in the first view, the only thing we can do is to remove all elements of the second view but I cannot really delete it. view1 name field1 field2 field3 name1 data... name2 data... __name1__ prop1 prop2 prop3 __name2__ prop1 prop2 prop3 When we delete name1 from the first view, I want to remove the whole view called __name1__. How can I do this.. storage.GetAs(__name1__) IOW, omit the usual [...] part. Am I missing something? Since everything is a view including database itself, in theory this should be possible. Yes. The one confusing part could be that the view won't go away until committed. All properties, including subviews, including therefore also top-level views, stay around afetr a restructure, and only truly vanish on commit (that makes it possible to re-structure, copy over, then commit). -jcw ___ metakit mailing list - [EMAIL PROTECTED] http://www.equi4.com/mailman/listinfo/metakit
Re: [Metakit] unique() with count?
[EMAIL PROTECTED] wrote: It turns out that the unique() view operation is pretty useful to me. However, what would be even more useful is to have a count of each record which indicates the total number of records which were in its identity class. apply(view.counts, view.structure() + ['frequency']) ? -jcw ___ metakit mailing list - [EMAIL PROTECTED] http://www.equi4.com/mailman/listinfo/metakit
Re: [Metakit] Passing views in Python
Brian Kelley wrote: My guess is the storage is going out of scope and being garbage collected and thereby closing your views without you knowing about it. You should keep the storage object in scope for your entire application run. Here is some proof of this: storage = metakit.storage() vw = storage.getas(test[I:I]) vw.append(1) 0 del storage len(vw) 1 vw[0].I Traceback (most recent call last): File stdin, line 1, in ? AttributeError: I I'm of two minds whether I think this is a bug or not. I think it might be onerous for the python wrapper to keep track of all the views/subviews and like being created and used. Spot on, I think. When a storage goes out of scope, all data becomes unavailable. Keep in mind that MK uses memory-mapped files usually, so access to data actually goes *straight* to disk in most cases. Closing the file blocks off that access. But MK does not control where view objects are used, it merely tracks reference counts (in the same way as Python does for its own PyObject's). The C++ view objects themselves are not tracked or reachable from the storage object. So at some point, the only way out I could think of is to make views act as being empty. All rows continue to exist, they just don't have any properties anymore. Which does indeed make them pretty useless, but at least it leads to well-defined semantics. Views which do come from a storage are unattached views, these are therefore autonomous. That's what you get when you use copies. The flip side is that they eat up memory, and have to be allocated and copied in full. The way to avoid problems in Python, is to store the storage object in a variable which is guaranteed to stay around as long as its data needs to be accessed. -jcw ___ metakit mailing list - [EMAIL PROTECTED] http://www.equi4.com/mailman/listinfo/metakit
Re: [Metakit] Re-assigning a c4_View - a quickie
Ian Fairclough wrote: Just a quick question, if you have the following code : c4_View viewB = viewA.Duplicate(); and then you want to re-assign viewB i.e. viewB = viewC.Duplicate(); What should you call prior to re-assigning viewB to ensure that the first viewB is properly destroyed. For example, would the following do it : viewB = viewA.Duplicate(); viewB.RemoveAll (); viewB = viewC.Duplicate(); There are two types of destruction. If the view is attached to a storage, the RemoveAll() will make sure all its rows are deleted (on file too, after commit). If all you care about is memory use and object clean-up, then you don't have to do anything. MK uses a technique called smart pointers in C++, which automatically manages all reference counts. The line viewB = viewC.Duplicate(); does a number of things: - it creates a new view with copies of what is in viewC - it increments the reference count of that new view - it decrements the refcount of whatever was in viewB - it makes viewB refer to the newly created copy If viewB previously referred to a copy of viewA, and if no other view object refers to it, then the decrement will cause that copy to be cleaned up. This is fully automatic in C++, as long as you stay away from pointers to c4_Views. There is no need whatsoever for these (same for c4_Storage, btw), not even performance-wise because c4_View objects are very very lightweight objects. Enjoy the magic of smart pointers! -jcw ___ metakit mailing list - [EMAIL PROTECTED] http://www.equi4.com/mailman/listinfo/metakit
Re: [Metakit] 'Blocked' views
Jacob Levy wrote: Thanks for the example -- I'm sure I can construct the equivalent C++ code, and if not I'll look through the tests that I'm sure contain some examples. Look in examples/. It's all there, C++ and Tcl. You mentioned that blocked views are advantageous for when you have lots of small strings. The advantages are better reuse of space and more compact storage? What other circumstances would benefit from using blocked views? Is there a (significant) performance penalty using blocked views? I'm not going to go into this - your best bet is to measure each case yourself. Look in examples - it has sample code, timing tests, scalability tests, etc. Look also at the link at the bottom of the MK docs page: http://www.equi4.com/mkdocs.html I forgot to mention that before. While MK may not have stellar documentation, we should at least try to make good use of what there is... right? Oh, and look in examples/ in the MK source distribution :) -jcw _ Metakit mailing list - [EMAIL PROTECTED] http://www.equi4.com/mailman/listinfo/metakit
Re: [Metakit] License
Pat Knight wrote: The Metakit license says I have to include the copyright notice and license text if my product contains substantial parts of the software. However, the precompiled DLLs for Windows don't contain the required text. Am I allowed to redistribute them, or do I have to build my own versions incorporating the text? Yes - feel free to redistribute them. To comply with the copyright/license, you can include the standard blurb in accompanying documentation. Perhaps also include a link to the MK homepage or license page. That way the origin of the software is clear - which is what the MIT license is all about (well, for me anyway). -jcw _ Metakit mailing list - [EMAIL PROTECTED] http://www.equi4.com/mailman/listinfo/metakit
Re: [Metakit] MKStats
Jeffrey Kay wrote: Is the source code for the mku portion of the mkstats program available? I thought that having the ability to compute the percentage of empty space in a db would be a helpful function to have in my code, specifically so I could decide when to compact the data. It appears that the c4_Strategy class has a FileSize() function, but that doesn't return the amount of bytes actually used in a file. How would I compute that value? Sorry, I'm afraid not. Mkstats uses some new code which is part of a larger project. The mku utility is not based on the Metakit C++ core, and does a complete traversal on its own of the data on disk to generate the usage info. You can call it as an executable, and parse the output, as mkstats does. -jcw _ Metakit mailing list - [EMAIL PROTECTED] http://www.equi4.com/mailman/listinfo/metakit
[Metakit] Metakit list at gmane.org
If you prefer to read this mailing list over the web, then you may want to check out the new archive at http://news.gmane.org/gmane.comp.db.metakit/ - it's quite sophisticated in its support of keyboard navigation (for javascript-capable browsers). Click on the question mark in the top right corner for details. There's also an NNTP interface. Thanks to Lars Magne Ingebrigtsen - Mr. Gmane, for making such a wonderful resource available, and for importing the entire Metakit mailing list archive. -jcw PS. FYI, Starkit list has also been on Gmane for some time now: http://news.gmane.org/gmane.comp.lang.tcl.starkit/ _ Metakit mailing list - [EMAIL PROTECTED] http://www.equi4.com/mailman/listinfo/metakit
Re: [Metakit] metakit and UTF-8
andrian wrote: I created a metakit database using Mk4tcl 2.4.9.2. I have saved the script file with the data to populate the db in UTF-8. However, the stored data appear to be corrupted. I understand that according to Metakit's specification UTF-8 is supported. Furthermore, I use Wikit, where I have, succesfully, stored UTF-8 data. What can be the cause of this problem? UTF-8 can definitely be stored in MK (sorting is another matter). Can you create a test script which shows the problem? Without it, I have no way of helping or even reproducing the problem, I'm afraid. FWIW, I have not heard of a problem with UTF-8 before. If you use Mk4tcl, then you may want to consider posting to the Starkit mailing list, which has many more Tcl susbcribers than this list: http://www.equi4.com/mailman/listinfo/starkit Though I read and respond to both, of course :) -jcw _ Metakit mailing list - [EMAIL PROTECTED] http://www.equi4.com/mailman/listinfo/metakit
Re: [Metakit] Closing storages (again)
[EMAIL PROTECTED] wrote: So now my question is this: Are weak references to PyStorage objects unsupported simply because the necessary stuff to support them was just never added? (It doesn't *look* as though it would take much to support this.) Or is there some more fundamental reason weak references to PyStorage objects can't be supported? Mk4py can still be compiled with Python 1.52. This has proved valuable to a number of people (myself included) whose hosting providers subscribe to the if it aint broke don't fix it philosophy. I don't know much about Python's weak references (simply because they were introduced after I was involved with Mk4py). I'd be willing to maintain a dual code-base, provided all differences are dealt with (one, I hope) #define's. So on my end, it's more a lack-of-time-not-high-enough-priority kind of issue than anything else. On a different, but related, note: I've been making good progress on integrating Nicholas Riley's changes to Mk4py. It now seems to be ok, other than that setup.py appears to be hitting distutils buglets with Python 2.2.3 (current default on my Gentoo Linux setup). For that combination, the answer will have to be: use make. Anyone using Mk4py: if you could download and verify the latest sources from CVS, that would be a big help - it's been holding up a new update of MK way too long already... -jcw _ Metakit mailing list - [EMAIL PROTECTED] http://www.equi4.com/mailman/listinfo/metakit
Re: [Metakit] Question
chris mollis wrote: I have a question about the best way to validate information on reads/writes to the db. For example, I'd like to make sure that data that is written out during a particular commit (by calculating a hash of data written, perhaps) can be verified again when the database is re-opened at a later date (possibly calculating the hash again and then checking this against the previous hash). What do you recommend to be the best way to do something like this? Should I override DataWrite/DataRead methods of c4_FileStrategy to calculate hashes on read and write operations? Good questions. There are several aspects to consider. The first one is really what sort of validation you are after: if you need to verify storage in general, then one could argue that there really is no other option than full file checksums, and even then it'll depend on the sort of validation as to when and how often you need to do it. Such checksums could be done outside MK, i.e. after commits and before opens. Another point to be made is that MK is a database: it does not read or write all data each time the datafile is used. By validating writes on commit, you'll be checking only what it changed, not the entire datafile. Due to the way data is stored, the data written can be all over the datafile, it's not necessarily contiguous (though individual columns are). It gets worse: MK usually loads data by mapping a file into memory. That means no read system calls take place at all in most cases: the data is mapped to a range of addresses and paged in via O/S page faults when accessed, which is a matter of following pointers. If you really insist on doing this in some sort of fine-grained manner, my suggestion would be to use a custom c4_Strategy class as you mention yourself, in combination with a *second* MK datafile. The invariant is that MK always writes entire columns - I suspect that it is possible to detect the column boundaries written by intercepting DataWrite(). The main call comes from column.cpp line 1532. Or it may be necessary to introduce two extra strategy members which get called once in each call to c4_Column::SaveNow(): - strategy_.DataInit(pos_) - unmodified while (iter...) loop - strategy_.DataDone(_size) The DataInit would reset a checksum field in the strategy object (and remember pos_), the DataWrite calls would incrementally update the checksum, and the DataDone call would save a pos,size,check triple in the second MK datafile. It'll take some extra logic to make this work across multiple commits, i.e. when space gets re-used, but that ought to be doable. You may want to use hashed views for the secondary MK file, to make it snappy. The most important problem to deal with is *when* to verify such saved checksums. If it has to be done during access, then I can't think of any other way than to disable memory mapping (by overriding c4_Strategy::ResetFileMapping with a dummy which does nothing). That makes MK slower and makes it use considerably more temp memory, however - so you'll have to think hard whether that is really what you want. If you just want to checksum occasionally, then you could iterate through all triples in the secondary MK file and verify each of the ranges. Another idea would be to save checksums per fixed-size block, say 4 Kb. That means DataWrite would track checksums, but it may need to read some data of the disk to deal with writes which are not exactly on block boundaries. This needs some thought to optimize, since most DataWrite calls will not be aligned nicely. Then again, DataWrite does get called in mostly sequential order, since it writes entire columns most of the time. -jcw _ Metakit mailing list - [EMAIL PROTECTED] http://www.equi4.com/mailman/listinfo/metakit
Re: [Metakit] Potentially Stupid Question
Brian Kelley wrote: Based on some previous posts to the metakit news group, I have learned that a metakit storage can be about 1.5 Gigabytes in storage before performance starts to decline. I.e. memory mapped access is no longer viable. Good to know. What happens if you have two storages open? Can each be 1.5 Gigabytes or does memory mapping not really scale this way. Nope - address space is a per-process limitation. The real way out is 64-bit address space machines. You may be able to squeeze some slack with redundancy reduction, compression, etc - but it'll probably be hard and may not even offer much payback. If you have big data items, you can put them on file, seek/read as needed, and manage space in MK - but that too will take some work. -jcw PS. I disagree with the title! _ Metakit mailing list - [EMAIL PROTECTED] http://www.equi4.com/mailman/listinfo/metakit
[Metakit] Metakit 2.4.9.3
This is to announce a new release of the Metakit embedded high-performance database library for C++, Python, and Tcl. This version consolidates bug fixes over the past 9 months since 2.4.9.2 came out. There should be no source code or binary incompatibilities, upgrading is recommended (but not urgent). An extract of the change log is appended, full details are available at: http://www.equi4.com/pub/mk/metakit-2.4.9.3.kit/CHANGES For details see the Metakit homepage at: http://www.equi4.com/metakit.html Sources and C++/Python/Tcl binaries for Windows, Mac OS X, and Linux are here: http://www.equi4.com/pub/mk/ Enjoy, Jean-Claude 2004-01-26MK 2.4.9.3 2004-01-22Fixed refcount problem with temp rows in Mk4tcl 2004-01-21Documentation updates 2004-01-20Don't trip over duplicate property names 2004-01-18Fixed rare but very serious subview resizing bug 2004-01-16Gracefully deal with bad property type specifiers 2004-01-03Fixed typo in PyView.cpp 2003-12-21Fixed Mk4too sorting on subview of length 1 2003-12-13Tweak to avoid two unisgned/signed compiler warnings 2003-12-11Checked in numerous changes to Mk4py by Nicholas Riley 2003-11-23Bumped to Python 2.3, doc tweaks, lots of name fixes 2003-10-28Get rid of --enable-python, check in c22.txt 2003-10-16Added note to Tcl docs 2003-10-10Added c22 test 2003-10-01Fixed bugs in Tcl test suite 2003-09-30Python 2.3.1 cleanup 2003-09-20Autoconf and libtool rebuilds 2003-08-26Documentation fix 2003-07-17Fixes to Mk4py (Gordon) 2003-07-11Fix for Linux not finding .lai file 2003-07-01Fixed Metakit (preferred) vs Metakit (obsolete) 2003-06-06Fix to Mk4py for case (in)sensitivity. 2003-05-15Add distutils setup.py script (Gordon). 2003-05-08Fixed array bound bug when not using mmap-ed files 2003-04-28Sourceforge 2003-04-25Autoconf/libtool update 2003-04-22Fixes to Mk4py (Gordon). 2003-03-16MK 2.4.9.2 _ Metakit mailing list - [EMAIL PROTECTED] http://www.equi4.com/mailman/listinfo/metakit
[Metakit] multi-column sorting
Does anyone know how to do a multi-column sort, using column-wise permutations? I'm looking at ways to optimize sorting, based on the fact that MK has a column-wise data organization. The current sort does row-wise comparisons. Here's what I'm after: * take the first column, sort it, and produce a permutation vector for it * take the second column, sort it, and ... * etc... So you end up with N permutations vectors. Each sorts only on the specified column. Assume that the sorts are stable, i.e. identical entries are kep in input order. I'm looking for a way to combine these permutations so that the result is a permutation which represents full sort order of the entire view. Tried some ideas, but none of them seem to be right. I've googled on the web, but can't find much relevant info (or don't grasp the theoretical foundations enough to spot the essential tricks). It would seem related to radix sorting. D'you know what's involved or have tips on what terms to look for? -jcw _ Metakit mailing list - [EMAIL PROTECTED] http://www.equi4.com/mailman/listinfo/metakit
Re: [Metakit] multi-column sorting
Brian Kelley wrote: Jean-Claude Wippler wrote: Does anyone know how to do a multi-column sort, using column-wise permutations? Is this the right approach? Thinking out loud here. A multi-column sort is really a precedence sort. You only need to sort on a secondary or tertiary key if the primary key has equivalent values. 1) sort on next property 2) if any more properties groupby property else goto 4 3) foreach groupby subview go to 1 4) reassemble final indices Result - stable sort, I think :) Yes, this is clear - and very much related, but quite what I mean. I'm looking for ways to use more efficient algorithms underneath MK, ie. as basis to do the above. I'm also looking for ways to do things lazily - i.e. defer some of the computations. This could have considerable implications when you sort a view and then as for a slice of it, i.e. only display a small section of it. I have a half-baked python implementation that requires an index column (mainly because the groupby method doesn't keep track of the row index) The way this can be done is to add an extra column with row indices (sort of like APL's iota) using the pair() operation, and then group. That way the result will carry original row indices with it. There are more such tricks waiting to be found exposed. I'm currently trying to better understand what sort of core functions are needed to build the rest with. Hence the Q about per-column sorting and trying to find a way to combine permutations. To give an example - to sort on col 2, 4 reverse, and then 3 could be done using something like this: m2 = sortmap(col[2]) m3 = sortmap(col[3]) m4 = sortmap(col[4]) result = view.remap(m3.remap(reverse(m4)).remap(m2)) (with partial use, i.e. when fetching only a slice of the result, all sorts of neat tricks can be added, leading to behavior which I think resembles what you are describing above) Except that the above permutation stacking is not exactly right... alas. -jcw _ Metakit mailing list - [EMAIL PROTECTED] http://www.equi4.com/mailman/listinfo/metakit
Re: [Metakit] Metakit wiki
John Fletcher wrote: I cannot find on the metakit home page at http://www.equi4.com/metakit.html any link to the metakit wiki at http://www.equi4.com/metakit/wiki.cgi/0 It's still there, at http://www.equi4.com/mkmailing.html I wondered if it was still there, and it is. For some purposes the Email list is better, but for things which develop over a period of time the wiki can be a useful reference. I've been switching to ProjectForum as wiki for some other projects, such as http://www.equi4.com/forum/rawiki/Home It offers more protection/authentification options, RSS feeds, CSS themes, file attachments, and a lot more (and yes, it's all powered by Metakit). Courtesy of Mark Roseman. Have been pondering for quite a while whether it would be feasible to migrate the MK wiki pages to PF, and revitalize things a bit. I agree that there is value in having information categorized, not just stored in a timeline, as it is now: http://news.gmane.org/gmane.comp.db.metakit/ http://www.equi4.com/pipermail/metakit/ If there is sufficient interest to help fill a new area, I'd be happy to set it up and participate in getting (some) old info transferred and a setting up a good structure, perhaps with areas per language. -jcw _ Metakit mailing list - [EMAIL PROTECTED] http://www.equi4.com/mailman/listinfo/metakit
Re: [Metakit] E4graph link
John Fletcher wrote: The e4graph link on page http://www.equi4.com/mklinks.html should be changed to http://e4graph.sourceforge.net/ Done, thx. -jcw _ Metakit mailing list - [EMAIL PROTECTED] http://www.equi4.com/mailman/listinfo/metakit
Re: [Metakit] data structure question
Jerry wrote: I have a data structure (that works well so far) with three similar sub-views that are accessed, set, and summarized at different points. Now I have a requirement to output a summary of all the detail with a label that identifies with of the three sets the data came from. The solution I came up with doesn't work, so I am thinking out loud to see if someone has an idea that involves the least amount of re-coding. The sub-views are accessed quite a bit in normal processing, and the summary only needs to be created once or twice a month. Simplized data format: vw = db.getas('main[id:I,fname:S,lname:S,new[key:S,val:I],old[key:S,val: I],adj[key:S,val:I]') jdb.dump(vw) idfname lname new old adjs -- - -- -- -- 1 first1 last1 0 rows 0 rows 0 rows 2 bob last2 2 rows 0 rows 0 rows 4 first4 last4 2 rows 0 rows 4 rows Using a combination of flatten, union, and project I can get REAL CLOSE to what I want: jdb.dump(vw31) idkeyval - --- 2 val10 10 2 val20 20 4 val10 10 4 val10 10 4 val20 20 4 val20 20 4 val30 30 4 val40 40 What I need is to know WHICH type of value each row is: idkeyval type - --- 2 val10 10 new 2 val20 20 new 4 val10 10 new 4 val10 10 adj 4 val20 20 new 4 val20 20 adj 4 val30 30 adj 4 val40 40 adj The main problem being that one cannot successfully add a property to a PyROViewer object, which is the result of the union and flatten methods. It seems I either have to: 1) create a separate temporary view for each type and manually copy the flattened view into it, creating and setting 'type' appropriately. Then union off of these PyView objects. 2) or modify my system to always write the subview type into the subview. This means extra programming and run-time overhead, and extra strings in tens of thousands of records. 3) or some other creative idea you suggest. This is exactly the sort of manipulation I hope to improve further, btw. It's becoming more and more clear that MK needs to offer full relational algebra + set operators. First of all, note that you can add properties on the fly to views, they will stay around until commit/rollback/close (but won't be saved if they are not part of a getas). So you could do: - for each row in new subview: set type property to new - etc for other subviews Then you'd be able to see ... ah, wait - I see your point now: Mk4py tracks R/O view status and forbids this (even though the C++ core would allow it). Ok, another idea: create a view with N copies of the string new (can be done virtually with wrap() in Mk4py). Use pair() to add that view next to the subview, i.e. horizontal concatenation. Does that take you closer to a solution? -jcw _ Metakit mailing list - [EMAIL PROTECTED] http://www.equi4.com/mailman/listinfo/metakit
Re: [Metakit] view() vs. getas() in Python
[EMAIL PROTECTED] wrote: Supposedly view() is The normal way to retrieve an existing view. But apparently even if the view doesn't exist, something gets retrieved -- though it isn't of much use. However, it does appear that you can append to this non-existent view since v = db.view('NonExistentView') v.append(foo=1) *appears* to work. But then, of course any attempt at referencing v[0].foo fails (though referencing v[0] does not). Shouldn't appending to a non-existent view raise an exception? If you look at db.properties() there again *appears* to be a view called 'NonExistentView' there. How can I tell that I've retrieved a non-existent view as opposed to, say, a merely empty one? If I try to use description() on a non-existent view I get a really ugly Python internal error. Given this, what is the point of view()? (No pun intended.) It's historical: getas used to be very expensive. And there used to be storeas. Nowadays, like you I tend to use getas all the time. The app essentially says: get me a view of such-and-such shape. Just do it, make it that shape if need be. Extremely handy for adding properties over time. The future of this is going to be different still, btw. The plan is to treat view structure as a meta view itself. So you'll have a view, where each row describes a column. I started on that in the current MK design, but it really goes much deeper and benefits from a fundamental switch to this approach in the core. That will go as far as making a row add in the meta view be equivalent to defining a new column, and so on for renames and deletes. But why not raise an exception if a bogus view name is given to view()? Good point. I think it would indeed help avoid time-wasting surprises. -jcw _ Metakit mailing list - [EMAIL PROTECTED] http://www.equi4.com/mailman/listinfo/metakit
[Metakit] libtool or not?
If you are building Metakit on anything but the usual quad or so of most common platforms, any of C++, Python, or Tcl - could you please help decide what to do? YES OR NO: get rid of libtool in the MK build process? WHY: less fighting, drop dependency on libtool, which has changed over the years. WHY NOT: may require some work to build for special platforms (AIX? HPUX?) HOW: switch to gcc -shared, with a few refinements to make it work on Mac OS X and such. These refinements can be added to the unix/configure.in logic, autoconf has sufficient capability to cover most cases, I think. WINDOWS: no change, when built with MSVC 6 (I just checked in a MSVC 7.0 version, btw). No change with mingw either, since -shared does the right thing nowadays. TCL: probably not affected, it has its own configure logic. PYTHON: probably not affected, it is moving to distutils. Your votes and opinions please... -jcw PS. In fact, I'd love to throw out all of make and autoconf, if I knew how to create an effective distro without them (Python is furthest along in that area, clearly, with its distutils). Make is a brilliant concept, but even that makes little sense when it's about deploying and compiling a tested distribution - once. IMO the only strong case for autoconf + make nowadays, is that everyone in OSS-land is used to the configure; make; make install salute. _ Metakit mailing list - [EMAIL PROTECTED] http://www.equi4.com/mailman/listinfo/metakit
Re: [Metakit] Metakit auto** problem
David McNab wrote: Nicholas Riley wrote: Try using the distutils build method instead of ./configure --enable-python - it'll work back to 1.5.2 if you use the latest distutils (which is also guaranteed to work back to 1.5.2). Tried that. With metakit's setup.py, distutils doesn't work for pythons earlier than 2.3. Is there something simple we can do to fix it? -jcw _ Metakit mailing list - [EMAIL PROTECTED] http://www.equi4.com/mailman/listinfo/metakit
[Metakit] libtool has been removed
I've removed libtool from the Metakit build setup and checked in changes to CVS. The changes are very preliminary - this build is likely to work on less platforms than before. Use the 2.4.9.3 distribution if you are not prepared to deal with this. I'll be adjusting this further in the coming weeks based on feedback and through tests on the platforms I use myself, and am soliciting patches suggestions on how to further improve things. Simplifications would be even better, especially drastic ones! -jcw _ Metakit mailing list - [EMAIL PROTECTED] http://www.equi4.com/mailman/listinfo/metakit
Re: [Metakit] problems with hash view in 2.4.9.3
Brian Kelley wrote: I don't know if this helps, but the error seems to be dependent on the column name 'email_sender' This is pretty weird... s=metakit.storage() fails = t[url_hash:I,email_sender:S] works = t[url_hash:I,pizza_sender:S] if 1: struc = fails field = 'email_sender' else: struc = works field = 'pizza_sender' v=s.getas(struc) hv=s.getas('hv[_H:I,_R:I]') v=v.hash(hv,1) new_vals={'url_hash':1, 'email_sender':'A', field:'A', 't':'A',} print v.append(new_vals) print v.append(new_vals) print v.append(new_vals) print v.append(new_vals) print len(v) metakit.dump(v) Uh, oh. Dict is used as sequence. Key order changes. To see it, add: for i in new_vals: print i -jcw _ Metakit mailing list - [EMAIL PROTECTED] http://www.equi4.com/mailman/listinfo/metakit
[Metakit] new Tcl build structure
While repairing the damage caused by removing libtool from the MK build process, I came up with what I think is a better way to deal with all the language bindings of Metakit. Have started implementing it for Tcl / Mk4tcl. The basic idea is to first build the core library in the builds/ directory, as before, possibly followed by running the regression test suite. Once that build is done, *leave* the object files there, and go to the respective language area to finish the job by building the extension there. This will re-use the object code generated from the initial core build, and link it all into the extension. Reasons for this unusual approach: - the core gets built first, and can be independently verified - extensions can adopt whatever the norm is for that language - no need to bring all the C++ config.h logic into extension builds - the result does not need a MK shared lib, since it includes it I've just checked new files into the tcl/ area of MK's CVS. It uses Tcl's standard TEA and is derived from Tcl's sample extension. The benefits so far is that the extension config logic is truly simple, all it does is link in a bunch of extra .o files from ${srcdir}/../builds/. Had to put CC=g++ into the environment to make TEA work with C++. Also had to force using autoconf = 2.5 on Gentoo (with WANT_AUTOCONF_2_5=1, yuck). The result is a shared lib called libMk4tcl2.4.9.3.so, and conveniences such as installing in the right place and with a suitably constructed pkgIndex.tcl file. The basic build logic should be: cd builds ../unix/configure make mkdir tcl cd tcl ../../tcl/configure --with-tcl=... make I hope to help do the same for Python / Mk4py and distutils. The one issue this approach introduces, is that the core library must be built first - with the same settings as the extension (shared vs. static, debug vs. non-debug, etc). It'll take a while to get these combinations right, and to document the new approach. The base configure scripts have not changed yet, but I think the libtool removal broke all scripting language bindings anyway. If you can't be bothered with any of this, use the 2.4.9.3 source distribution for now. -jcw _ Metakit mailing list - [EMAIL PROTECTED] http://www.equi4.com/mailman/listinfo/metakit
[Metakit] Fwd: [Starkit] Mk4tcl - SegFault when using cursors :-(
Begin forwarded message: From: Jean-Claude Wippler [EMAIL PROTECTED] Date: March 18, 2004 23:54:31 CET To: starkit list server [EMAIL PROTECTED] Subject: Re: [Starkit] Mk4tcl - SegFault when using cursors :-( Christoph Drube wrote: I have massive problems using MetaKit with Tcl (ActiveTcl 8.4.5). Sorry to hear that. # Searching property by property # Q: Is there a better way to search ? set hits {} foreach i $proplist \ { set hits [concat $hits [mk::select $v -first 2 -glob $i $s]] } mk::select $v -first 2 -glob $proplist $s Well, when swapping the rows, this script crashes with seg fault. I had a look at the row contents and the search results - all is fine, but after the second or third iteration over nr it always crashes :-/ What I'm doing wrong? Have I misunderstood the mk::cursor command or their use? Make sure you use Mk4tcl 2.4.9.3 - from the change log: 2004-01-22Fixed refcount problem with temp rows in Mk4tcl This was a long-standing bug: mk::row create did not work right because the tracking of temporary rows was completely messed up. Added test case for Tcl (mk6.8), fixes FB14, BTS#78, and BTS#29. It really drives me to despair cause it's not the first seg fault with mk4tcl - isn't it possible to use property names with blanks? I'm not sure. I always avoid blanks in property names. Tcl has no restrictions, but identifiers in C++ and Python are limited in the same way. Christoph (frustrated) :-( Ouch. -jcw _ Starkit mailing list - [EMAIL PROTECTED] http://www.equi4.com/mailman/listinfo/starkit _ Metakit mailing list - [EMAIL PROTECTED] http://www.equi4.com/mailman/listinfo/metakit
Re: [Metakit] Re: Maybe problems with Metakit 2.4.9.3
Yasushi Iwata wrote: I found another problem. Following code dose not work as expected. [...] But if you remove ordered(2) from getas(), it works as expected. I also removed ordered(2) from example code that I posted yesterday, it worked fine. There must be something wrong with ordered(). Thanks for diagnosing this. Yes, I suspect ordered() has troubles - perhaps it's with more than 1 key field. There are some complex interactions between the view model of indexed access, i.e. control over where things go, and ordered - which tries to decide on its own where to put things (and hash() has no such issues, since it maintains order in MK). -jcw _ Metakit mailing list - [EMAIL PROTECTED] http://www.equi4.com/mailman/listinfo/metakit
[Metakit] Mk4py build test Q
I have a question for Python experts, w.r.t. distutils: I'd like to try and get setup.py working on its own. Here's what I get right now (cvs HEAD, build dir wiped): $ python setup.py build running build running build_py creating ../builds/lib.linux-i686-2.3 copying metakit.py - ../builds/lib.linux-i686-2.3 running build_ext running config gcc -E -I/usr/include/python2.3 -o _configtest.i _configtest.c removing: _configtest.c _configtest.i building 'Mk4py' extension creating ../builds/temp.linux-i686-2.3 creating ../builds/temp.linux-i686-2.3/scxx g++ -fno-strict-aliasing -DNDEBUG -fPIC -DHAVE_UNICODEOBJECT_H=1 -Iscxx -I../include -I/usr/include/python2.3 -c PyView.cpp -o ../builds/temp.linux-i686-2.3/PyView.o [...] g++ -pthread -shared ../builds/temp.linux-i686-2.3/PyProperty.o ../builds/temp.linux-i686-2.3/PyRowRef.o ../builds/temp.linux-i686-2.3/PyStorage.o ../builds/temp.linux-i686-2.3/PyView.o ../builds/temp.linux-i686-2.3/scxx/PWOImp.o ../builds/column.o ../builds/custom.o ../builds/derived.o ../builds/fileio.o ../builds/field.o ../builds/format.o ../builds/handler.o ../builds/persist.o ../builds/remap.o ../builds/std.o ../builds/store.o ../builds/string.o ../builds/table.o ../builds/univ.o ../builds/view.o ../builds/viewx.o -lstdc++ -o ../builds/lib.linux-i686-2.3/Mk4py.so g++: ../builds/column.o: No such file or directory [...] g++: ../builds/viewx.o: No such file or directory error: command 'g++' failed with exit status 1 $ Is there a simple way to resolve this? The workaround is to first do: cd ../builds; ../unix/configure; make The other issue I ran into is testing: $ python setup.py test running test running build running build_py running build_ext running config gcc -E -I/usr/include/python2.3 -o _configtest.i _configtest.c removing: _configtest.c _configtest.i Traceback (most recent call last): File setup.py, line 184, in ? extra_objects=mkobjs, File /usr/lib/python2.3/distutils/core.py, line 149, in setup dist.run_commands() File /usr/lib/python2.3/distutils/dist.py, line 907, in run_commands self.run_command(cmd) File /usr/lib/python2.3/distutils/dist.py, line 927, in run_command cmd_obj.run() File setup.py, line 133, in run import test.regrtest ImportError: No module named regrtest $ (Am using 2.3.3 on Linux, btw) It went away when I disable the line in setup.py: #sys.path.insert(0, self.test_dir) But then it seems to get lost in finding other stuff: $ python setup.py test running test running build running build_py running build_ext running config gcc -E -I/usr/include/python2.3 -o _configtest.i _configtest.c removing: _configtest.c _configtest.i test_inttypes test_inttypes skipped -- No module named test_inttypes 1 test skipped: test_inttypes 1 skip unexpected on linux2: test_inttypes $ -jcw _ Metakit mailing list - [EMAIL PROTECTED] http://www.equi4.com/mailman/listinfo/metakit
Re: [Metakit] First metakit failure, database grew to 2+ gigabyte s
Brian Kelley wrote: Berk, Murat wrote: We use 'spans' and remove them in one operation and also do not commmit anything until we finish a pass over all rows. But main trick is blocked views, which uses smaller footprint on commits. Murat Yeah, I am using blocked views as well, but after checking the code, I was commiting after every delete! Ouch! I'm switching over to deleting spans so it should work a lot better. The memory usage of individual deletes, especially across blocked views, is most probably due to MK allocating 4 Kb buffer chunks in every column a change is made (and sometimes much more to hold modified copies of ranges of data). With blocked views, I suspect that memory usage could indeed rise to a multiple of the dataset. A blocked view with say 5 columns and 4 rows, could have 5 x 40 = 200 blocks, i.e. 800 Kb of sparsely filled buffers pending until flushed by a commit or rollback. The fix for this would be to track the total set of buffers, and start coalescing some in-memory data buffers to free some of that up (and to do so well before actual commits). I'm surprised that memory usage stays high across commits though, and even more by what looks like a 32-bit sign overflow in file positions getting through undetected and messing up a datafile. The 2 Gb limit should lead to commit failures, not file damage! -jcw _ Metakit mailing list - [EMAIL PROTECTED] http://www.equi4.com/mailman/listinfo/metakit
Re: [Metakit] Mk4py build test Q
Hello Jack, (thanks for your help on test.py vs. mktest.py) If you rename test.py to mktest.py you should be able to use both of them. I saw the mktest.py rename in CVS, and it almost works for me. I get the 'freebsd4' suite of tests (on debian linux) which tries to include a stdlib test module that only applies to freebsd. I haven't looked at it any closer, but I would guess something in CVS has a hard definition for freebsd. I've just checked in some more changes and a few files I missed for Mk4py testing. The tests now seem to work on Linux. I've not found anything specific for FreeBSD so far. It may be caused by something which is Mac OS X specific, which is also *BSD-ish. -jcw _ Metakit mailing list - [EMAIL PROTECTED] http://www.equi4.com/mailman/listinfo/metakit
[Metakit] blocked views
There is a faster implementation of blocked views in CVS now. It evolved from a change submitted by M. Berk (thank you!) and appears to have a considerable effect on performance. The trick is to cache the last used subview. If you use blocked views and check out the latest code from CVS, you will see. If you don't, let me just say that blocked views are now a good option for very large views. Performance benefits are particularly good for views with many properties, and when traversing them sequentially. To switch to using blocked views, change code which looks like: vw = store.getas(vw[...]) to vm = store.getas(vw[_B[...]]).blocked() You'll also need to reload data, this change won't convert it for you. With a somewhat lower raw access performance, you'll get much more scalable views (millions of rows and more), faster commits, and smaller datafiles. There's no need to switch over every view - it's still a trade-off. If your views are rarely modified, or contain no strings, or are always accessed in random order (hash maps), then flat is often still better. But it's there if you want it. -jcw _ Metakit mailing list - [EMAIL PROTECTED] http://www.equi4.com/mailman/listinfo/metakit
Re: [Metakit] PyDS 0.7.2 database corruption
Nicholas Riley wrote: I am not sure whether this is PyDS threading issues or Metakit bugs. In any case, Metakit should not crash while attempting to read data from a database! Agree. But stray pointer writes can damage things. I'm not saying this is the case here, just pointing out that software bugs in the same address space can damage a MK datafile despite its failsafe logic. If anyone (Jean-Claude?) wants to see one/more of the databases, I can send them. Please do. -jcw _ Metakit mailing list - [EMAIL PROTECTED] http://www.equi4.com/mailman/listinfo/metakit
Re: [Metakit] Metakit and Tcl (and maybe others)
Bob X wrote: I am using 2.4.9.3 on Windows XP with ActiveTcl. I am creating a simple ticket tracker and I defined my view: set view [mk::view layout db.tracker username:S ticket:S recieved:I closed:I problem:S notes:S status:I] I then append into the view: mk::row append $view username Jeff Walsh ticket 01081 recieved 20040419 closed 20040419 problem Password locked notes Reset password to Ellipse status 0 I then get errors: error expected integer but got 01058 (looks like invalid octal number) while executing mk::row append $view username Don Lang ticket 01058 recieved 20040409 closed 20040409 problem Application is hanging notes Network prob... (file initial_loader.tcl line 18) /error Yes, leading zero's bite when treating a Tcl string as an integer. I'm assuming the ticket:S is actually a ticket:I in the example you gave - then it would fail. The leading zero defaulting to octal mode is a painful idiosyncrasy of Tcl, see http://mini.net/tcl/498 http://www.tcl.tk/cgi-bin/tct/tip/114.html I could change it to a String (works that way) but I would like to leave it an Integer. Are the leading zero's causing the problem? I have to have those as the program spitting the data out uses those. You can't have your cake and eat it in cases like these :) - either you treat values as integers (which have no knowledge of representation, such as leading zero's) or you stick to a string, which is slower and takes up more space. I tend to use either of two tricks for this: - add 10 to the value and store that (then strip 1st char again on extract to make sure the 0's stay) - convert to int via ... ticket [scan %d 01058] ... (and convert back as needed with: puts [format %9d $value]) -jcw _ Metakit mailing list - [EMAIL PROTECTED] http://www.equi4.com/mailman/listinfo/metakit
Re: [Metakit] c4_Bytes.Modify() mangling data
Brian Kelley wrote: I was inserting strings of length 2 which was why it worked for me. Yours were larger. It turns out that you can't cross the end boundary when using modify. So if you are inserting a string of length 10, you can't insert it into a string of length 9. Also, if you are inserting it into a string of length 12, you can only insert at 0 or 1! Whoa. Good catch and analysis! The attached change ought to fix this issue. I'll verify this later, but it'll let you proceed for now (or you can use the Python workaround, of course). -jcw Index: viewx.cpp === RCS file: /home/cvs/metakit/src/viewx.cpp,v retrieving revision 1.11 diff -u -p -u -r1.11 viewx.cpp --- viewx.cpp 23 Nov 2003 01:42:51 - 1.11 +++ viewx.cpp 23 Sep 2004 17:49:15 - @@ -581,21 +581,15 @@ bool c4_BytesRef::Modify(const c4_Bytes c4_Handler h = _cursor._seq-NthHandler(colNum); const int n = buf_.Size(); const t4_i32 limit = off_ + n; // past changed bytes -const t4_i32 overshoot = limit - h.ItemSize(_cursor._index); - -if (diff_ overshoot) - diff_ = overshoot; + // get rid of an optimization, it was wrong (2004-09-23) c4_Column* col = h.GetNthMemoCol(_cursor._index, true); if (col != 0) { if (diff_ 0) col-Shrink(limit, - diff_); else if (diff_ 0) - // insert bytes in the highest possible spot - // if a gap is created, it will contain garbage -col-Grow(overshoot 0 ? col-ColSize() : - diff_ n ? off_ : limit - diff_, diff_); +col-Grow(off_, diff_); col-StoreBytes(off_, buf_); } _ Metakit mailing list - [EMAIL PROTECTED] http://www.equi4.com/mailman/listinfo/metakit
Re: [Metakit] How to backup a metakit database?
Allan Wind wrote: How do you backup a metakit database? The cold case is obvious as usual, ensure that no one else has the database file open prior to making a copy of the data with a file level tools (cp, tar etc). What are the options for hot (i.e. open with an active writer) backups? I noticed the information in the python api reference for doing this from the writer thread/process, but are there any options for doing it externally? If the file is open commit-extend, can you use the same trick if you open the datbase read-only in a 2nd process? If using commit-aside, is it then safe to just low-level copy the main database? The race is during a commit (from when it starts to until it completes), because that is when MK writes to file. You will need to stay out of that time span if you wish to have a solid backup. It seems to me that it could be done on a not-too-active DB simply as follows: - determine clock time T - wait until at least one sec has passed since T (actually: the time resolution of the underlying filesystem) - copy entire datafile - check mod date of orig, must still be = T - rinse and repeat if this test failed On Windows, I am not sure this will work: if the O/S does not update modtimes right away then the above will not be reliable. The other way to do it is with support from the committing app so independent readers have a way of telling whether there was a commit, say by incrementing a revision number of a separate info file. Is journaling planned? I.e. point in time recoverability between backups. There is a first cut at this via the commit-aside mode. There have also been simple-but-working tricks in the Tcl wrapper to intercept all calls (to do remoting, as well as creating transcripts of all requests for debugging/replay). With custom viewers, one could write a view layer which intercepts all changes at the C++ level, but that requires more work and discipline during use. I've been hesitant to implement such thing only because I am trying to improve the raw core of MK before building more on top. They are definitely good ideas and *very* worthwhile, IMO. -jcw _ Metakit mailing list - [EMAIL PROTECTED] http://www.equi4.com/mailman/listinfo/metakit
Re: [Metakit] Re: c4_Bytes destructor causes memory deallocation failure
Arto Stimms wrote: It seems that the destructor releases the wrong memory. In the debug build it gives an assertion at the deallocation, but in a release build it gives no error. This just makes it worse though, since it may later try to use the released memory, causing a crash. Check this example program which on my machine fails after the fourth iteration: #include mk4.h #include string #include iostream using namespace std; void main() { c4_Storage storage(datafile.kit, true); c4_View v = storage.GetAs(v[b:B]); v.Add(c4_Row()); c4_BytesProp pBytes(b); string teststring(Hello, this is a test!); // len=22 c4_Bytes textbytes(teststring.data(), teststring.length()); for (int i=0;i 100;++i) { cout i endl; c4_Bytes newbytes = pBytes(v[0]).Access(0, 17); pBytes(v[0]).Modify(textbytes, 0, textbytes.Size()); } } I am using metakit 2.4.9.3 with the modify patch on windows. I am not seeing this with the CVS build on Linux or Mac OS X. Don't have a Windows compile setup ready this minute, could you check with latest CVS as well? (FWIW, I had to add an AutoCommit() call to make anything end up on disk) -jcw _ Metakit mailing list - [EMAIL PROTECTED] http://www.equi4.com/mailman/listinfo/metakit
Re: [Metakit] In search of sparc solaris metakit building tips
Larry, metakit 2.4.9.3 and Tcl/Tk 8.4.7 I'm having a bit of a problem: $ configure --prefix=/usr/tcl84 --enable-shared --enable-symbols --with-tcl {lots of stuff output - can email if desirable} $ make all {a lot more output} CC -c -g -I../unix/../include -I/usr/tcl84/include/generic -I/usr/tcl84/include ../unix/../tcl/mk4tcl.cpp -KPIC -DPIC {a lot of warnings} ../unix/../tcl/mk4tcl.cpp, line 415: Error: Cannot cast from c4_LongRef to long long. ../unix/../tcl/mk4tcl.cpp, line 490: Error: Cannot assign long long to c4_LongRef without c4_LongRef::operator=(const c4_LongRef);. If these are the only fatal errors, I suggest you try the following: Change line 415 to: Tcl_SetWideIntObj(obj_, (t4_i64) (((c4_LongProp) prop_) (row_))); And line 490 to: ((c4_LongProp) prop_) (row_) = (t4_i64) value; If that doesn't work, then there may be some weirdness w.r.t. 64-bitness and int/long casts which I cannot diagnose further without access to a setup like yours (it would be great if someone else can). The option you have left in that case, is to fully disable Tcl's wide (8-byte) ints in the Mk4tcl interface, by replacing #ifdef TCL_WIDE_INT_TYPE with #if 0 on source lines 413 and 484. -jcw _ Metakit mailing list - [EMAIL PROTECTED] http://www.equi4.com/mailman/listinfo/metakit
Re: [Metakit] corrupt database?
Roy Sigurd Karlsbakk wrote: I'm an OS X user and my addressbook just fscked up. This is for what I've been told, based on metakit. Are there any tools around that I can use to try to rebuild it? I can find the data scattered all over the database file, but I can't assemble it... Chances are very slim. Metakit datafiles have very little redundancy. Data is stored column-wise, which means that adjacent items on file are not part of the same row but values from different entries. Finding out which item goes with which is next to impossible if the structural information in the datafile is damaged. Having said that, this is the very first report ever of a corrupted address book, as far as I'm aware. I cannot quite rule out a hardware glitch at this stage. Note that the datafile is in your Library - Application Support - AddressBook folder and is called AddressBook.data. There is also an AddressBook.data.previous, which might contain a backup if all else fails. One way to determine whether your data is salvageable is perhaps the following: 1) download these two files to your Desktop folder: http://www.equi4.com/pub/tk/8.4.8/tclkit-darwin-ppc.gz http://mini.net/sdarchive/mk2tcl.kit 2) launch the Terminal application, it's in your Utilities folder 3) in the new window, enter these lines *exactly* as follows: cd Desktop gzip -d tclkit-darwin-ppc.gz chmod +x tclkit-darwin-ppc ./tclkit-darwin-ppc mk2tcl.kit saved.txt \ '../Library/Application Support/AddressBook/AddressBook.data' 4) open the newly created saved.txt file, i.e. double-click it (make the TextEdit window as wide as you can, preferably) With a bit (a lot!) of luck, you may be able to see entries from your address book. If not, then I don't see an easy way to recover things - it may not be possible at all in fact. If you do see entries, then my suggestion would be to contact Apple since in that case the datafile itself is still readable (I have no knowledge or involvement in the AddressBook itself). -jcw _ Metakit mailing list - [EMAIL PROTECTED] http://www.equi4.com/mailman/listinfo/metakit
Re: [Metakit] corrupt database?
Roy Sigurd Karlsbakk wrote: 4) open the newly created saved.txt file, i.e. double-click it (make the TextEdit window as wide as you can, preferably) there's no such editor like vi TextEdit sucks :) With a bit (a lot!) of luck, you may be able to see entries from your address book. If not, then I don't see an easy way to recover things - it may not be possible at all in fact. If you do see entries, then my suggestion would be to contact Apple since in that case the datafile itself is still readable (I have no knowledge or involvement in the AddressBook itself). grr. I got more info about it from 'strings AddressBook.data.previous' but then, the tables are stored (name table)garbage(number table)garbage etc so I can prolly find the format somehow, some day... thanks anyway Have you tried what I suggested? -jcw _ Metakit mailing list - [EMAIL PROTECTED] http://www.equi4.com/mailman/listinfo/metakit
Re: [Metakit] make install error
Kenny Chamber wrote: I've been trying to get metakit (both cvs and latest tarball) to compile with no success. Actually it compiles but won't install. The following is the output of the make install command: make install mkdir -p /usr/include /usr/lib /bin/sh ./libtool --mode=install /bin/install -c -m 644 ../unix/../include/mk4.h \ ../unix/../include/mk4.inl \ ../unix/../include/mk4str.h \ ../unix/../include/mk4str.inl /usr/include /bin/install -c -m 644 ../unix/../include/mk4.h /usr/include/mk4.h /bin/install -c -m 644 ../unix/../include/mk4.inl /usr/include/mk4.inl /bin/install -c -m 644 ../unix/../include/mk4str.h /usr/include/mk4str.h /bin/install -c -m 644 ../unix/../include/mk4str.inl /usr/include/mk4str.inl /bin/sh ./libtool --mode=install /bin/install -c libmk4.la /usr/lib /bin/install -c .libs/libmk4.lai /usr/lib/libmk4.la /bin/install: cannot stat `.libs/libmk4.lai': No such file or directory make: *** [install-mk] Error 1 As far as I can tell the file libmk4.lai doesn't exist anywhere. Any suggestions would be appreciated. You're running a mix of 2.4.9.3 (libtool-based) and cvs HEAD (libtool gone). You need to do a make distclean and then re-run configure, make, etc. -jcw _ Metakit mailing list - [EMAIL PROTECTED] http://www.equi4.com/mailman/listinfo/metakit
[Metakit] Re: Appending rows, effectiveness, documentation
Wolfgang Lipp wrote: imho, it would be a good idea to have a command similar to metakit.wrap() to add large number of data items to an existing view; that would solve most problems. or is there some efficient way to get the data from one (in-memory) view to another (on-disk) view? What language? In C++ you can insert one view into another. That and using blocked views should go a long way (don't pre-allocate in blocked views, it probably won't help much). Ah, wait, your metakit.wrap() comment indicates you're using the Python binding. Hmmm, looks like we forgot to add a wrapper for C++'s view.InsertAt(pos,view). -jcw _ Metakit mailing list - Metakit@equi4.com http://www.equi4.com/mailman/listinfo/metakit
Re: Re[6]: [Metakit] newbie question - writing derived view back to db
Marcin Krol wrote: Geez, Brian, you're a wizard! I agree 100%. After syncing: 23.08 [...] Thanks for the help, Brian, now I have to go away to munch on all that. Now that you have these results: what file sizes do you see across the different DB's? (It might also be interesting to compare int-field performances sizes, BTW) -jcw _ Metakit mailing list - Metakit@equi4.com http://www.equi4.com/mailman/listinfo/metakit
Re: Re[2]: [Metakit] newbie question - writing derived view back to db
Marcin Krol wrote: BK vw2 = st.getas(test_save[a:i,b:s]) [...] However, there's another silly problem here remaining: how to delete the old view 'test' from the db and rename 'test_save' to 'test'? Try: st.getas(test_save) (note the absence of brackets and fields) As for renaming, you'll have to copy things over, I'm afraid. -jcw _ Metakit mailing list - Metakit@equi4.com http://www.equi4.com/mailman/listinfo/metakit
Re: [Metakit] Mailing lists out of order
Well, the mailing list web interface is working again, yippie! http://www.mail-archive.com/mailman-users@python.org/msg29743.html [...] I'll just assume it'll be addressed over the coming days and come down as an update. The exact explanation. With a 10 sec fix: http://www.mail-archive.com/mailman-users@python.org/msg29850.html Yegadda love the net! -jcw _ Metakit mailing list - Metakit@equi4.com http://www.equi4.com/mailman/listinfo/metakit
Re: [Metakit] Python Patch for inserting a view into a view
Brian Kelley wrote: At long last, attached is the diff and the new PyView.cpp file that allows the python interface to insert a view into another view. usage: view.insert(index, view2) is now supported. Properties that don't exist in view but exist in view2 will be added to view. example: import metakit st = metakit.storage() v = st.getas(test[a:i,b:S,c:S]) v2 = st.getas(test2[d:i]) for i in range(100): v.append((i, str(i), str(i))) v2.append((1,)) v2.insert(0, v) metakit.dump(v2) del st Thank you, I've applied the patch. It's in CVS for now. I'm considering wrapping up a new minor release distribution again, to wrap up the last few tweaks. -jcw _ Metakit mailing list - Metakit@equi4.com http://www.equi4.com/mailman/listinfo/metakit
[Metakit] Regression in MK C++, Mk4py, and Mk4tcl
FYI, the following change to MK appears to be faulty: 2004-09-23Fix c4_BytesRef::Modify bytes insertion It shows up in MK's regression test b26, which fails. There is an explanation for why this hasn't been caught before, which I won't go into. It's most unfortunate. Thanks to Pat Thoyts for reporting the details of this. If you rely on insertion/mods/deletes of partial data in fields, you may want to revert to an earlier CVS checkout, i.e. cvs ... -D 2004-09-21 Several Tclkit builds are affected, probably 8.4.[7-9] and 8.5a2 - these have all been built after that time, and may or may not have used cvs HEAD then. If you are worried about potential datafile corruption from this, revert to Tclkit 8.4.6 to avoid it. I've not yet researched what exactly happens, but wanted to get this notice out asap. -jcw _ Metakit mailing list - Metakit@equi4.com http://www.equi4.com/mailman/listinfo/metakit
[Metakit] Re: [Starkit] Regression in MK C++, Mk4py, and Mk4tcl
FYI, the following change to MK appears to be faulty: 2004-09-23Fix c4_BytesRef::Modify bytes insertion The change has been undone in CVS now, so latest CVS should be ok again. -jcw _ Metakit mailing list - Metakit@equi4.com http://www.equi4.com/mailman/listinfo/metakit
[Metakit] RSS feed
I've just found out that Gmane, the mailing-list-to-news gateway service, now also has a blog gateway service. So now you have three different ways to track postings the this mailing list: - Mailman: http://www.equi4.com/mailman/listinfo/metakit - News: http://news.gmane.org/gmane.comp.db.metakit/ - Weblog: http://blog.gmane.org/gmane.comp.db.metakit?set_skin=zawodny (RSS feed at http://rss.gmane.org/gmane.comp.db.metakit !) Gmane does a number of clever things, such as removing the mailman info blurb at the bottom of each posting. It also looks like it supports posting, though I'm not sure those get through the filters. You can turn on the no-mail option in Mailman if you prefer to use one of the other mechanisms to track news yet want to be able to post yourself. Thank you Gmane, for a wonderful free service. -jcw _ Metakit mailing list - Metakit@equi4.com http://www.equi4.com/mailman/listinfo/metakit
Re: [Metakit] Maximum practical size of Metakit databse?
Davis Adrian wrote: What factors govern the maximum practical size for a Metakit database? (This email was pending in a queue I rarely check nowadays due to the levels of spam flooding it, please consider subscribing to the mailing list to avoid getting in there) There's a hard limit at 2 Gb due to the way signed 32-bit ints are used in MK and due to the limitations of a 32-bit address space. You'll be able to get close to that if there are not too many subviews, you don't modify large amounts of data (modifications use 4 Kb memory buffer chunks). Reading is usually ok, it's usually the creation side that causes trouble first. I'd expect a blocked view with only numerical data to get furthest, all the way up to that 2 Gb barrier in fact. So with MK 2.4.9.3, I'd say that generally speaking 1 Gb is roughly the end of it. If you have large amounts of data being small ints, these sizes cannot easily be compared with other database solutions, due to MK's use of adaptive int vectors which can be substantially more compact. Btw, on a 64-bit architecture you can actually sneak around these limits by using multiple datafiles. Note that all view operators can be used across datafiles. The current codebase won't go beyond these limits even on a 64-bit architecture. In the lab I've been growing a new strain of MK which overcomes this, as a recent test with an 8 Gb MK datafile proved (the next-generation size limit will end up in the Tb range). IOW, the file format can handle larger datasets - it's just the code which runs out of steam. -jcw _ Metakit mailing list - Metakit@equi4.com http://www.equi4.com/mailman/listinfo/metakit
[Metakit] Performance comparison Q
To all language specialists: I'm looking for a way to establish some basic performance figures, to compare and evaluate a number of approaches I'm exploring in the Vlerq project. As a very first datapoint, it would be nice to find out how one writes decent loops for a very simple task: sum the items of a list of 50,000 integers, running from 0 to 49,999. This is quite an important operation in MK, where cumulative offsets must often be calculated - it also gives an indication how efficient integer lists/ vectors are. The C code is pretty obvious: int sum = 0; for (i = 0; i 5; ++i) sum += data[i]; This one in tcl 8.4.6 runs at quite a bit under 1% of that speed: set sum 0; foreach x $data { incr sum $x } My question is: how would you write the above in insert your language of choice here ? This is not flame bait. I'm not trying to prove X is better than Y, I'm trying to find out what range of performances one sees these days, and how much I can get away with for now by *not* optimizing my new code to the limit (it also affects some major decisions on what internal data structures I should use at this stage). I'm aware of the various language shootout websites, the risks of benchmarking, and cache effects. Still, self-contained examples of this logic would help me avoid seriously flawed timings in other languages when applied to tasks which are relevant to Metakit. I'll summarize results. -jcw PS. All timing comparisons are being done using a PIII/650 on Linux. I've got the following installed so far if you're interested: python 2.3.4, perl 5.8.5, ruby 1.8.2, php 4.3.10, java 1.4.2, icon 9.40, gforth 0.6.2, lua 5.0.2 - can add more as needed. _ Metakit mailing list - Metakit@equi4.com http://www.equi4.com/mailman/listinfo/metakit
Re: [Metakit] Performance comparison Q
Brian Kelley wrote: python: == import operator data = range(5) # test data result = sum(data) Nice, of course. What about arbitrary operators, not just summation. I'm trying to stress generic looping, as well as see how well lists, ints, and addition work together. Sorry for the confusion. -jcw _ Metakit mailing list - Metakit@equi4.com http://www.equi4.com/mailman/listinfo/metakit
Re: [Metakit] Performance comparison Q
Bruce A.Johnson wrote: Are you putting the Tcl test line inside a proc? Yes, thx. -jcw _ Metakit mailing list - Metakit@equi4.com http://www.equi4.com/mailman/listinfo/metakit
Re: [Metakit] Performance comparison Q
Magnus Lie Hetland wrote: result = sum(xrange(5)) I just did a simple experiment (using the timeit module) comparing the performance of sum(range(5)) and sum(xrange(5)), and the latter gave a speedup factor of about 2.2 on my computer... Also, allocating a list of size 5 seems a bit wasteful just for computing this sum :) The point is not the result (25000*4 will get there a lot faster). I'm trying to see how a list of values, iteration over it, and a simple integer operator work together in each particular language. Don't quite see the same speedup for xrange, but as I said it is not the issue here for me. -jcw _ Metakit mailing list - Metakit@equi4.com http://www.equi4.com/mailman/listinfo/metakit
Re: [Metakit] Performance comparison Q - results
Here are some performance figures, as promised. All timings were done on a PIII/650 laptop with Gentoo Linux 2.6.10, gcc 3.3.5 on May 9, 2005. The task: calculate the sum of a list containing the numbers 0..4. C0.6 mSec array of ints Python loop 72 mSec for s += x (or s += data[i]) Python reduce 36 mSec reduce(operator.add,data) Python sum 18 mSec built-in sum() Tcl foreach 37 mSec foreach incr Tcl for 44 mSec for incr lindex Thrill vec 24 mSec 0 swap { @ + } rep* Thrill ints 5 mSec convert to int vec use C primitive Please take these figures with a grain of salt. I've not investigated memory use. That's Python 2.4.3 and Tcl 8.4.6, BTW. Let me add that my first naive timings for Tcl were 8x slower - which shows how easy it is to go wrong in performance measurements. The last two entries use a Forth'ish language I've been using in the Vlerq research project. These results look promising because it seems to indicate that I could adopt Thrill's generic interpreted code for now, without performance problems. The C code is obviously in a different league, but this particular operation is included as primitive in Thrill, so special-casing is right around the corner. Thanks Brian, Jeff, Gary, Magnus, Bruce, Jacob, this was most enlightening. -jcw _ Metakit mailing list - Metakit@equi4.com http://www.equi4.com/mailman/listinfo/metakit
Re: [Metakit] Starting work on a Java version of Metakit
[EMAIL PROTECTED] wrote: [... to java or not to java ...] I respect your concerns. I can well imagine that relatively direct Java access to Metakit databases would be welcomed by a significant number of Java developers. I encourage this effort. Me too. And if there is someone out there who wants to create a binding for Ruby, R, PHP, Perl, Lua, C#, or any other language: I'll bend over backwards to help you succeed. There are some recent developments which might substantially simplify that effort, so please contact me if you're interested. Metakit has always been about *not* tying data formats to a language (as most serialized formats do), and not to a limited time-frame (i.e. maintaining compatibility and readability for the very long term). Metakit's file format is the way it is for very strong technical reasons, but I have some self-contained pure-Tcl and pure- Python readers laying around if people cannot use the C++ bindings for some reason, so no-one can accuse me of pursuing a lock-in strategy. Making MK data usable from many more languages is a long term goal. As I said, I welcome everyone who wants to help make that happen. Feel free to pass this invitation on. On the topic of speed: I'm working on creating a more highly vectorized design for Metakit. So far, this has not only demonstrated (in the lab) potential for more performance, it also means that it will make it less of an issue as to which host language people decide to work with. The trend is towards making the real crunching happen in a smaller part of the code - which can be tweaked and tuned to no end, whether in C, machine code, vector- hardware, or even some existing high-performance library to hook into. It's a bit like GUI's, every app today benefits from major advances made in the OS and video driver and video hardware and all sorts of GPU's. -jcw _ Metakit mailing list - Metakit@equi4.com http://www.equi4.com/mailman/listinfo/metakit
Re: [Metakit] Starting work on a Java version of Metakit
Brian Kelley wrote: And jcw, could I see the python only reader please, please :) Yeah, I was afraid you'd ask. Took me ages to find it on an old CD backup, even though I'm pretty well organized w.r.t. my backups these days (it's hard to find things by location when you don't know *where* they are and it's hard to find things by name when you don't remember *what* you called it!). Attached, vintage 1999 code. It may no longer work due to MK 1.9 - 2.x file format tweaks. Just for completeness, a Tcl version is at: http://www.equi4.com/pub/sk/readkit.tcl -jcw unmk.py Description: application/applefile # Decoding a MetaKit datafile in Python # # JCW/1999-11-13/2000-04-22/ import os, struct, shlex, StringIO, string, array reader = None freespace = None def HexDump(s): a rudimentary hex data dump v = [] for c in s: v.append(%02X % ord(c)) return string.join(v) def DeduceWidth(numrows, size): calculate bits per int, given row count and column size w = 0 if numrows 0: w = (size 3) / numrows if numrows = 7 and 0 size = 6: widthtab = [ ( 8, 16, 1, 32, 2, 4 ), # n = 1 ( 4, 8, 1, 16, 2, 0 ), # n = 2 ( 2, 4, 8, 1, 0, 16 ), # n = 3 ( 2, 4, 0, 8, 1, 0 ), # n = 4 ( 1, 2, 4, 0, 8, 0 ), # n = 5 ( 1, 2, 4, 0, 0, 8 ), # n = 6 ( 1, 2, 0, 4, 0, 0 ) ] # n = 7 w = widthtab[numrows-1][size-1] assert w 0 assert (w (w-1)) == 0 return w def CheckFreeSpace(): freespace.sort() curr = 0 gaps = 0 bytes = 0 print 'Free space summary:' for (pos, len) in freespace: if pos curr: print ### Free space is corrupt: (%d,%d) overlaps %d % \ (pos, len, curr) if pos curr: print Free: %6d..%-6d (%db) % (curr, pos-1,pos-curr) gaps = gaps+1 bytes = bytes + (pos - curr) curr = pos + len print %d bytes free in %d gaps, %db used, last used is %d % \ (bytes, gaps, curr-bytes, curr) class IntVector: An array which accesses ints of 0..32 bits def _get_0b(self,index): return 0 def _get_1b(self,index): return (self.vector[index3] (index7)) 1 def _get_2b(self,index): return (self.vector[index2] ((index3) * 2)) 3 def _get_4b(self,index): return (self.vector[index1] ((index1) * 4)) 15 def __init__(self,width,data): type = 'b' if width == 0: self.__getitem__ = self._get_0b elif width == 1: self.__getitem__ = self._get_1b elif width == 2: self.__getitem__ = self._get_2b elif width == 4: self.__getitem__ = self._get_4b elif width == 8: type = 'b' elif width == 16: type = 'h' elif width == 32: type = 'l' else: assert None self.vector = array.array(type, data) def __getitem__(self,index): return self.vector[index] class Column: A range of bytes on disk def __init__(self): self.size = reader.pull() self.pos = 0 if self.size: self.pos = reader.pull() freespace.append((self.pos, self.size)) def __repr__(self): return 'Column: @%d [%db]' % (self.pos, self.size) def __len__(self): return self.size class ColOfInts (Column): A column interpreted as vector of integers def __init__(self, numrows): Column.__init__(self) self.numrows = numrows self.width = DeduceWidth(numrows, self.size) data = reader.fetch(self.pos, self.size) self.getter = IntVector(self.width, data) def __repr__(self): return 'ColOfInts: #%d/%d, @%d [%db]' % \ (self.numrows, self.width, self.pos, self.size) def __len__(self): return self.numrows def __getitem__(self,index): return self.getter[index] class BytesCol: A data + size column pair def __init__(self, numrows): self.data = Column() self.size = None self.pos = None if self.data.size: self.sizes = ColOfInts(numrows) self.offsets = [self.data.pos] for s in self.sizes: self.offsets.append(self.offsets[-1] + s) self.memos = Column() def __repr__(self): return 'BytesCol %s, %s ' % (self.data, self.sizes) def __len__(self): return self.sizes.numrows def __getitem__(self,index): i1 = self.offsets[index] i2 = self.offsets[index+1] return %10d-%-4d = %s % (i1,i2,`reader.data[i1:i2]`) class View: A view is a columnar version of a table def __init__(self, parent=None, fields=None): self.parent = parent self.columns = [] self.sias = reader.pull() assert self.sias == 0 # not yet if fields is None: reader.descriptor = reader.read(reader.pull()) fields = reader.parseDesc() self.fields = fields self.numrows = reader.pull() for (name,code) in fields: if type(code) == type([]): col = Column() assert type(code) == type([]) savepos = reader.pos reader.pos = col.pos col = [] for r in xrange(self.numrows): v = View(self, code) col.append(v) reader.pos = savepos elif code in IFD: col = ColOfInts(self.numrows) elif code in BS: col =
Re: [Metakit] Investigating a corrupt metakit file.
Pat, The problem is that out of 56 _B subviews all but one are as expected. However, one block is damaged. The values for the date and size columns have got swapped about. If the data file is wrong, but readable, then my first hunch would be a bug. Setting the wrong column could point to a property cache bug, either in the core or in the Tcl binding (the latter is more likely, IMO). I have had one report in the past (at least a year ago) of a mixup, also from Tcl. I was not able to reproduce it, and it seemed to be related to byte-code compilation, i.e. whether the Tcl script was inside a proc or not. The problem went away (well, that's what I like to think) by using distinct property names, I think it was related to have two props named the same but with a different type (:S vs :I or some such). dirs[name:S,parent:I,ctime:I,atime:I,mtime:I,clsid:S,state:I, files[ _B[name:S,size:I,date:I,state:I,contents:B] ] ] In your case, I see only state in two different views and in two different column positions, and you're not listing problems with that one, so probably this whole hunch is irrelevant. Did you restructure the view at any point in time? I.e. did the layout change once you started adding the first data? If you can create a test set which fails (big if, I know), then I could investigate or write a Python test to see whether this is Tcl- specific. You can probably leave out the contents to create a much smaller test set. I can't rule out anything at this stage, but would not expect an SMB mount to cause problems which only alter the column choice of data items, I'd expect it to create an unreadable datafile by messing up things at a much lower level: one or more disk blocks in the file. Much larger datasets than yours, and with lots of blocked views, have been in use for some time. So my first suspicion goes to the Tcl wrapper (blocked views from Tcl have not been used much). Oh wait - *are* you using Tcl? I'm jumping to conclusions a bit too quickly... -jcw _ Metakit mailing list - Metakit@equi4.com http://www.equi4.com/mailman/listinfo/metakit
Re: [Metakit] Mk4py
On Aug 4, 2005, at 18:37, Brian Kelley wrote: Yeah the spaces kill me as well sometimes, and then I think that the spaces are okay sometimes. The real issue is that a metakit column name can include any printable character except a comma ,. Nor [, ], :, and a few more such as parentheses and braces which I'd like to reserve for new uses. Best to stick with alphanumerics only, even though MK does not enforce it. Best also to be consistent in the use of upper/lower case. So, now you know :) Here is another gotcha for you. Never, ever delete a column and then add a column with the same name and a different type. This will drive you bananas, I guarantee. To safely do this, delete the column, write out the db to a new file. delete the database, repoen it and then add the new column. The key is the commit - that is the moment when a deleted column really goes away. The new-file/delete/reopen approach is fine too, but not strictly necessary. -jcw _ Metakit mailing list - Metakit@equi4.com http://www.equi4.com/mailman/listinfo/metakit
Re: [Metakit] trouble installing 2.4.9.4
Jack Diederich wrote: I was upgrading from 2.4.9.3 to 2.4.9.4 and I get this error when I tried to load it and get this error. sprat:~/src/metakit-2.4.9.4/builds# python Python 2.4.1 (#2, Mar 30 2005, 21:51:10) [GCC 3.3.5 (Debian 1:3.3.5-8ubuntu2)] on linux2 Type help, copyright, credits or license for more information. import metakit Traceback (most recent call last): File stdin, line 1, in ? File /usr/lib/python2.4/site-packages/metakit.py, line 22, in ? from Mk4py import * ImportError: ./Mk4py.so: undefined symbol: _ZTVN10__cxxabiv117__class_type_infoE Some C++ compiler name munging. I've been away from C++ so long I don't know how to track this down. It looks like the .so file links to C++ runtime routines which haven't been loaded, presumably because neither Python nor the .so have a -lstdc++. Could it be that the last link step of the .so is gcc i.s.o. g++? -jcw _ Metakit mailing list - Metakit@equi4.com http://www.equi4.com/mailman/listinfo/metakit
Re: [Metakit] trouble installing 2.4.9.4
Jack Diederich wrote: Traceback (most recent call last): File stdin, line 1, in ? File /usr/lib/python2.4/site-packages/metakit.py, line 22, in ? from Mk4py import * ImportError: ./Mk4py.so: undefined symbol: _ZTVN10__cxxabiv117__class_type_infoE [...] I changed the Makefile from SHLIB_LD = gcc -shared to SHLIB_LD = g++ -shared and now it works fine, thanks. It used g++ for all the other compiling steps but not the final linking. Ah, that explains it. I've changed configure.in and configure as well: Index: unix/configure === RCS file: /home/cvs/metakit/unix/configure,v retrieving revision 1.45 diff -u -r1.45 configure --- unix/configure 10 Jun 2005 16:02:22 - 1.45 +++ unix/configure 26 Sep 2005 21:45:49 - @@ -1482,7 +1482,7 @@ if test $SHARED_BUILD = 1; then SHLIB_FLAGS=-shared SHLIB_CFLAGS=-fPIC - SHLIB_LD=gcc -shared + SHLIB_LD=g++ -shared else SHLIB_FLAGS= SHLIB_CFLAGS= Index: unix/configure.in === RCS file: /home/cvs/metakit/unix/configure.in,v retrieving revision 1.36 diff -u -r1.36 configure.in --- unix/configure.in 10 Jun 2005 16:02:22 - 1.36 +++ unix/configure.in 26 Sep 2005 21:45:49 - @@ -117,7 +117,7 @@ if test $SHARED_BUILD = 1; then SHLIB_FLAGS=-shared SHLIB_CFLAGS=-fPIC - SHLIB_LD=gcc -shared + SHLIB_LD=g++ -shared else SHLIB_FLAGS= SHLIB_CFLAGS= This has been checked into CVS and should solve it for good now. -jcw _ Metakit mailing list - Metakit@equi4.com http://www.equi4.com/mailman/listinfo/metakit