Changeset: d4a362c539db for MonetDB
URL: http://dev.monetdb.org/hg/MonetDB?cmd=changeset;node=d4a362c539db
Modified Files:
        gdk/Makefile.ag
        gdk/gdk.h
        gdk/gdk.mx
Branch: headless
Log Message:

This is the start of the branch to get rid of BATs.
BATs are *binary* association tables.  We want to move towards single
column structures called COLs.  Basically, these are BATs whose HEAD
column is always of type VOID-with-seqbase.

Work on this is done on a branch called "headless".

In addition, we're going to move away from the use of Mx.

This changeset is a (small) step on both fronts.

BATstore, BAT, COLrec, BATrec, and BUNrec are all gone and have been
replaced by a single structure called COL.  I have also gone through
the macro definitions and function declarations and removed references
to head/tail (although I may have missed a few).

There is still a lot of work to be done.  I can guarantee, the current
version cannot get through even the most lenient compiler.


diffs (truncated from 5310 to 300 lines):

diff --git a/gdk/Makefile.ag b/gdk/Makefile.ag
--- a/gdk/Makefile.ag
+++ b/gdk/Makefile.ag
@@ -38,7 +38,7 @@
                gdk_scanselect_defs_str.mx \
                gdk_scanselect_defs_fix.mx \
                gdk_scanselect_defs_var.mx \
-               gdk_scanselect.mx gdk.mx gdk_batop.mx \
+               gdk_scanselect.mx gdk_batop.mx \
                gdk_search.mx gdk_tm.mx gdk_align.mx gdk_bbp.mx \
                gdk_heap.mx gdk_setop.mx gdk_utils.mx gdk_atoms.mx \
                gdk_qsort.mx gdk_ssort.mx gdk_storage.mx gdk_bat.mx \
@@ -52,3 +52,5 @@
                $(SOCKET_LIBS) $(zlib_LIBS) $(BZ_LIBS) \
                $(MALLOC_LIBS) $(PTHREAD_LIBS) $(DL_LIBS)
 }
+
+EXTRA_DIST = gdk.h
diff --git a/gdk/gdk.mx b/gdk/gdk.h
rename from gdk/gdk.mx
rename to gdk/gdk.h
--- a/gdk/gdk.mx
+++ b/gdk/gdk.h
@@ -1,392 +1,393 @@
-@/
-The contents of this file are subject to the MonetDB Public License
-Version 1.1 (the "License"); you may not use this file except in
-compliance with the License. You may obtain a copy of the License at
-http://monetdb.cwi.nl/Legal/MonetDBLicense-1.1.html
+/*
+ * The contents of this file are subject to the MonetDB Public License
+ * Version 1.1 (the "License"); you may not use this file except in
+ * compliance with the License. You may obtain a copy of the License at
+ * http://monetdb.cwi.nl/Legal/MonetDBLicense-1.1.html
+ *
+ * Software distributed under the License is distributed on an "AS IS"
+ * basis, WITHOUT WARRANTY OF ANY KIND, either express or implied. See the
+ * License for the specific language governing rights and limitations
+ * under the License.
+ *
+ * The Original Code is the MonetDB Database System.
+ *
+ * The Initial Developer of the Original Code is CWI.
+ * Portions created by CWI are Copyright (C) 1997-July 2008 CWI.
+ * Copyright August 2008-2011 MonetDB B.V.
+ * All Rights Reserved.
+ */
 
-Software distributed under the License is distributed on an "AS IS"
-basis, WITHOUT WARRANTY OF ANY KIND, either express or implied. See the
-License for the specific language governing rights and limitations
-under the License.
+/*
+ * @f gdk
+ * @t The Goblin Database Kernel
+ * @v Version 3.05
+ * @a Martin L. Kersten, Peter Boncz, Niels Nes
+ *
+ * @+ The Inner Core
+ * The innermost library of the MonetDB database system is formed by
+ * the library called GDK, an abbreviation of Goblin Database Kernel.
+ * Its development was originally rooted in the design of a pure
+ * active-object-oriented programming language, before development
+ * was shifted towards a re-usable database kernel engine.
+ *
+ * GDK is a C library that provides ACID properties on a DSM model
+ * @tex
+ * [@cite{Copeland85}]
+ * @end tex
+ * , using main-memory
+ * database algorithms
+ * @tex
+ * [@cite{Garcia-Molina92}]
+ * @end tex
+ *  built on virtual-memory
+ * OS primitives and multi-threaded parallelism.
+ * Its implementation has undergone various changes over its decade
+ * of development, many of which were driven by external needs to
+ * obtain a robust and fast database system.
+ *
+ * The coding scheme explored in GDK has also laid a foundation to
+ * communicate over time experiences and to provide (hopefully)
+ * helpful advice near to the place where the code-reader needs it.
+ * Of course, over such a long time the documentation diverges from
+ * reality. Especially in areas where the environment of this package
+ * is being described.
+ * Consider such deviations as historic landmarks, e.g. crystallization
+ * of brave ideas and mistakes rectified at a later stage.
+ *
+ * @+ Short Outline
+ * The facilities provided in this implementation are:
+ * @itemize
+ * @item
+ * GDK or Goblin Database Kernel routines for session management
+ * @item
+ *  BAT routines that define the primitive operations on the
+ * database tables (BATs).
+ * @item
+ *  BBP routines to manage the BAT Buffer Pool (BBP).
+ * @item
+ *  ATOM routines to manipulate primitive types, define new types
+ * using an ADT interface.
+ * @item
+ *  HEAP routines for manipulating heaps: linear spaces of memory
+ * that are GDK's vehicle of mass storage (on which BATs are built).
+ * @item
+ *  DELTA routines to access inserted/deleted elements within a
+ * transaction.
+ * @item
+ *  HASH routines for manipulating GDK's built-in linear-chained
+ * hash tables, for accelerating lookup searches on BATs.
+ * @item
+ *  TM routines that provide basic transaction management primitives.
+ * @item
+ *  TRG routines that provided active database support. [DEPRECATED]
+ * @item
+ *  ALIGN routines that implement BAT alignment management.
+ * @end itemize
+ *
+ * The Binary Association Table (BAT) is the lowest level of storage
+ * considered in the Goblin runtime system
+ * @tex
+ * [@cite{Goblin}]
+ * @end tex
+ * .  A BAT is a
+ * self-descriptive main-memory structure that represents the @strong{binary
+ * relationship} between two atomic types.
+ * The association can be defined over:
+ * @table @code
+ * @item void:
+ *  virtual-OIDs: a densely ascending column of OIDs (takes zero-storage).
+ * @item bit:
+ *  Booleans, implemented as one byte values.
+ * @item chr:
+ * A single character (8 bits @strong{integer}s).
+ * DEPRECATED for storing text (Unicode not supported).
+ * @item bte:
+ *  Tiny (1-byte) integers (8-bit @strong{integer}s).
+ * @item sht:
+ *  Short integers (16-bit @strong{integer}s).
+ * @item int:
+ *  This is the C @strong{int} type (32-bit).
+ * @item oid:
+ *  Unique @strong{long int} values uses as object identifier. Highest bit 
cleared always.
+ *         Thus, oids-s are 31-bit numbers on 32-bit systems, and 63-bit 
numbers on 64-bit systems.
+ * @item wrd:
+ *  Machine-word sized integers
+ *  (32-bit on 32-bit systems, 64-bit on 64-bit systems).
+ * @item ptr:
+ * Memory pointer values. DEPRECATED.  Can only be stored in transient BATs.
+ * @item flt:
+ *  The IEEE @strong{float} type.
+ * @item dbl:
+ *  The IEEE @strong{double} type.
+ * @item lng:
+ *  Longs: the C @strong{long long} type (64-bit integers).
+ * @item str:
+ *  UTF-8 strings (Unicode). A zero-terminated byte sequence.
+ * @item bat:
+ *  Bat descriptor. This allows for recursive adminstered tables, but
+ * severely complicates transaction management. Therefore, they
+ * CAN ONLY BE STORED IN TRANSIENT BATs.
+ * @end table
+ *
+ * This model can be used as a back-end model underlying other -higher
+ * level- models, in order to achieve @strong{better performance} and
+ * @strong{data independence} in one go. The relational model and
+ * the object-oriented model can be mapped on BATs by vertically
+ * splitting every table (or class) for each attribute. Each such a
+ * column is then stored in a BAT with type @strong{bat[oid,attribute]}, where
+ * the unique object identifiers link tuples in the different BATs.
+ * Relationship attributes in the object-oriented model hence are
+ * mapped to @strong{bat[oid,oid]} tables, being equivalent to the concept of
+ * @emph{join indexes}
+ * @tex
+ * [@cite{Valduriez87}]
+ * @end tex
+ * .
+ *
+ * The set of built-in types can be extended with user-defined types
+ * through an ADT interface.  They are linked with the kernel to obtain
+ * an enhanced library, or they are dynamically loaded upon request.
+ *
+ * Types can be derived from other types. They represent something different
+ * than that from which they are derived, but their internal storage management
+ * is equal. This feature facilitates the work of extension programmers, by
+ * enabling reuse of implementation code, but is also used to keep the GDK code
+ * portable from 32-bits to 64-bits machines: the @strong{oid} and 
@strong{ptr} types
+ * are derived from @strong{int} on 32-bits machines, but is derived from 
@strong{lng}
+ * on 64 bits machines. This requires changes in only two lines of code each.
+ *
+ * To accelerate lookup and search in BATs, GDK supports one built-in
+ * search accelerator: hash tables. We choose an implementation efficient
+ * for main-memory: bucket chained hash
+ * @tex
+ * [@cite{LehCar86,Analyti92}]
+ * @end tex
+ * . Alternatively, when the table is sorted, it will resort to merge-scan
+ * operations or binary lookups.
+ *
+ * BATs are built on the concept of heaps, which are large pieces of main
+ * memory. They can also consist of virtual memory, in case the working
+ * set exceeds main-memory. In this case, GDK supports operations that
+ * cluster the heaps of a BAT, in order to improve performance of its
+ * main-memory.
+ *
+ *
+ * @- Rationale
+ * The rationale for choosing a BAT as the building block for both
+ * relational and object-oriented system is based on the following
+ * observations:
+ *
+ * @itemize
+ * @item -
+ * Given the fact that CPU speed and main-memory increase in
+ * current workstation hardware for the last years has been exceeding
+ * IO access speed increase, traditional disk-page oriented algorithms
+ * do no longer take best advantage of hardware, in most database operations.
+ *
+ * Instead of having a disk-block oriented kernel with a large memory
+ * cache, we choose to build a main-memory kernel, that only under large data
+ * volumes slowly degrades to IO-bound performance, comparable to
+ * traditional systems
+ * @tex
+ * [@cite{boncz95,boncz96}]
+ * @end tex
+ * .
+ *
+ * @item -
+ * Traditional (disk-based) relational systems move too much data
+ * around to save on (main-memory) join operations.
+ *
+ * The fully decomposed store (DSM
+ * @tex
+ * [@cite{Copeland85})]
+ * @end tex
+ * assures that only those attributes of a relation that are needed,
+ * will have to be accessed.
+ *
+ * @item -
+ * The data management issues for a binary association is much
+ * easier to deal with than traditional @emph{struct}-based approaches
+ * encountered in relational systems.
+ *
+ * @item -
+ * Object-oriented systems often maintain a double cache, one with the
+ * disk-based representation and a C pointer-based main-memory structure.
+ * This causes expensive conversions and replicated storage management.
+ * GDK does not do such `pointer swizzling'. It used virtual-memory
+ * (@strong{mmap()}) and buffer management advice (@strong{madvise()}) OS 
primitives to
+ * cache only once. Tables take the same form in memory as on disk,
+ * making the use of this technique transparent
+ * @tex
+ * [@cite{oo7}]
+ * @end tex
+ * .
+ * @end itemize
+ *
+ * A RDBMS or OODBMS based on BATs strongly depends on our ability
+ * to efficiently support tuples and to handle small joins, respectively.
+ *
+ * The remainder of this document describes the Goblin Database kernel
+ * implementation at greater detail. It is organized as follows:
+ * @table @code
+ * @item @strong{GDK Interface}:
+ *
+ * It describes the global interface with which GDK sessions can be
+ * started and ended, and environment variables used.
+ *
+ * @item @strong{Binary Association Tables}:
+ *
+ * As already mentioned, these are the primary data structure of GDK.
+ * This chapter describes the kernel operations for creation, destruction
+ * and basic manipulation of BATs and BUNs (i.e. tuples: Binary UNits).
+ *
+ * @item @strong{BAT Buffer Pool:}
+ *
+ * All BATs are registered in the BAT Buffer Pool. This directory is used
+ * to guide swapping in and out of BATs. Here we find routines that guide
+ * this swapping process.
+ *
+ * @item @strong{GDK Extensibility:}
+ *
+ * Atoms can be defined using a unified ADT interface.
+ * There is also an interface to extend the GDK library with
+ * dynamically linked object code.
+ *
+ * @item @strong{GDK Utilities:}
+ *
+ * Memory allocation and error handling primitives are provided. Layers
+ * built on top of GDK should use them, for proper system monitoring.
+ * Thread management is also included here.
+ *
+ * @item @strong{Transaction Management:}
+ *
+ * For the time being, we just provide BAT-grained concurrency and global
+ * transactions. Work is needed here.
+ *
+ * @item @strong{BAT Alignment:}
+ * Due to the mapping of multi-ary datamodels onto the BAT model,
_______________________________________________
Checkin-list mailing list
[email protected]
http://mail.monetdb.org/mailman/listinfo/checkin-list

Reply via email to