Changeset: a00a44651f90 for MonetDB
URL: http://dev.monetdb.org/hg/MonetDB?cmd=changeset;node=a00a44651f90
Modified Files:
monetdb5/extras/compiler/mal_compiler.c
monetdb5/extras/compiler/mal_compiler.mx
monetdb5/extras/crackers/crackers_pq.c
monetdb5/extras/crackers/crackers_pq.mx
monetdb5/extras/rdf/rdf.h
monetdb5/extras/rdf/rdf.mx
monetdb5/extras/sphinx/sphinx.c
monetdb5/extras/sphinx/sphinx.mx
monetdb5/mal/mal.c
monetdb5/mal/mal.mx
monetdb5/mal/mal_authorize.c
monetdb5/mal/mal_authorize.mx
monetdb5/mal/mal_builder.c
monetdb5/mal/mal_builder.mx
monetdb5/mal/mal_debugger.mx
monetdb5/mal/mal_errors.h
monetdb5/mal/mal_errors.mx
monetdb5/mal/mal_exception.c
monetdb5/mal/mal_exception.mx
monetdb5/mal/mal_factory.c
monetdb5/mal/mal_factory.mx
monetdb5/mal/mal_function.c
monetdb5/mal/mal_function.mx
monetdb5/mal/mal_instruction.mx
monetdb5/mal/mal_interpreter.mx
monetdb5/mal/mal_linker.c
monetdb5/mal/mal_linker.mx
monetdb5/mal/mal_namespace.c
monetdb5/mal/mal_namespace.mx
monetdb5/mal/mal_properties.c
monetdb5/mal/mal_properties.mx
monetdb5/mal/mal_recycle.c
monetdb5/mal/mal_recycle.mx
monetdb5/mal/mal_sabaoth.c
monetdb5/mal/mal_sabaoth.mx
monetdb5/mal/mal_scenario.c
monetdb5/mal/mal_scenario.mx
monetdb5/mal/mal_session.c
monetdb5/mal/mal_session.mx
monetdb5/mal/mal_stack.c
monetdb5/mal/mal_stack.mx
monetdb5/mal/mal_type.c
monetdb5/mal/mal_type.mx
monetdb5/mal/mal_utils.c
monetdb5/mal/mal_utils.mx
monetdb5/mal/mal_xml.c
monetdb5/mal/mal_xml.mx
monetdb5/modules/atoms/blob.c
monetdb5/modules/atoms/blob.mx
monetdb5/modules/atoms/color.c
monetdb5/modules/atoms/color.mx
monetdb5/modules/atoms/identifier.c
monetdb5/modules/atoms/identifier.mx
monetdb5/modules/atoms/inet.c
monetdb5/modules/atoms/inet.mx
monetdb5/modules/atoms/xml.c
monetdb5/modules/atoms/xml.mx
monetdb5/modules/kernel/alarm.c
monetdb5/modules/kernel/alarm.mx
monetdb5/modules/kernel/counters.c
monetdb5/modules/kernel/counters.mx
monetdb5/modules/kernel/lock.c
monetdb5/modules/kernel/lock.mx
monetdb5/modules/kernel/logger.c
monetdb5/modules/kernel/logger.mx
monetdb5/modules/kernel/microbenchmark.c
monetdb5/modules/kernel/microbenchmark.mx
monetdb5/modules/kernel/sema.c
monetdb5/modules/kernel/sema.mx
monetdb5/modules/kernel/unix.c
monetdb5/modules/kernel/unix.mx
monetdb5/modules/mal/Makefile.ag
monetdb5/modules/mal/algebraExtensions.c
monetdb5/modules/mal/algebraExtensions.mx
monetdb5/modules/mal/attach.c
monetdb5/modules/mal/attach.mx
monetdb5/modules/mal/batExtensions.c
monetdb5/modules/mal/batExtensions.mx
monetdb5/modules/mal/chopper.c
monetdb5/modules/mal/chopper.mx
monetdb5/modules/mal/constraints.c
monetdb5/modules/mal/constraints.mx
monetdb5/modules/mal/groupby.c
monetdb5/modules/mal/groupby.mx
monetdb5/modules/mal/histogram.c
monetdb5/modules/mal/histogram.mx
monetdb5/modules/mal/language.mx
monetdb5/modules/mal/mal_init.mal
monetdb5/modules/mal/mal_init.mx
monetdb5/modules/mal/manual.c
monetdb5/modules/mal/manual.mx
monetdb5/modules/mal/mat.mx
monetdb5/modules/mal/mdb.mx
monetdb5/modules/mal/mkey.c
monetdb5/modules/mal/mkey.mx
monetdb5/modules/mal/pcre.c
monetdb5/modules/mal/pcre.mx
monetdb5/modules/mal/profiler.c
monetdb5/modules/mal/profiler.mx
monetdb5/modules/mal/recycle.c
monetdb5/modules/mal/recycle.mx
monetdb5/modules/mal/remote.c
monetdb5/modules/mal/remote.mx
monetdb5/modules/mal/sabaoth.c
monetdb5/modules/mal/sabaoth.mx
monetdb5/modules/mal/tablet_mk.c
monetdb5/modules/mal/tablet_mk.mx
monetdb5/modules/mal/tablet_si.c
monetdb5/modules/mal/tablet_si.mx
monetdb5/modules/mal/tablet_sql.c
monetdb5/modules/mal/tablet_sql.mx
monetdb5/modules/mal/trader.c
monetdb5/modules/mal/trader.mx
monetdb5/modules/mal/transaction.c
monetdb5/modules/mal/transaction.mx
monetdb5/modules/mal/txtsim.c
monetdb5/modules/mal/txtsim.mx
monetdb5/optimizer/Makefile.ag
monetdb5/optimizer/opt_dataflow.mx
monetdb5/optimizer/opt_deadcode.mx
monetdb5/optimizer/opt_mergetable.mx
monetdb5/optimizer/opt_origin.mx
monetdb5/optimizer/opt_partition.mx
monetdb5/optimizer/opt_pipes.c
monetdb5/optimizer/opt_pipes.mx
monetdb5/optimizer/opt_prelude.c
monetdb5/optimizer/opt_prelude.h
monetdb5/optimizer/opt_prelude.mx
monetdb5/optimizer/opt_statistics.c
monetdb5/optimizer/opt_statistics.mx
monetdb5/optimizer/opt_support.c
monetdb5/optimizer/opt_support.h
monetdb5/optimizer/opt_support.mx
monetdb5/scheduler/Makefile.ag
monetdb5/scheduler/run_adder.c
monetdb5/scheduler/run_adder.mx
monetdb5/scheduler/run_centipede.c
monetdb5/scheduler/run_centipede.h
monetdb5/scheduler/run_centipede.mal
monetdb5/scheduler/run_centipede.mx
monetdb5/scheduler/run_isolate.c
monetdb5/scheduler/run_isolate.mx
monetdb5/scheduler/run_memo.c
monetdb5/scheduler/run_memo.mx
monetdb5/scheduler/run_octopus.c
monetdb5/scheduler/run_octopus.mx
Branch: default
Log Message:
Merge with Aug2011 branch.
diffs (truncated from 74101 to 300 lines):
diff --git a/gdk/gdk.mx b/gdk/gdk.mx
--- a/gdk/gdk.mx
+++ b/gdk/gdk.mx
@@ -17,303 +17,304 @@
All Rights Reserved.
@
-@f gdk
-@t The Goblin Database Kernel
-@v Version 3.05
-@a Martin L. Kersten, Peter Boncz, Niels Nes
-
-@+ The Inner Core
-The innermost library of the MonetDB database system is formed by
-the library called GDK, an abbreviation of Goblin Database Kernel.
-Its development was originally rooted in the design of a pure
-active-object-oriented programming language, before development
-was shifted towards a re-usable database kernel engine.
-
-GDK is a C library that provides ACID properties on a DSM model
-@tex
-[@cite{Copeland85}]
-@end tex
-, using main-memory
-database algorithms
-@tex
-[@cite{Garcia-Molina92}]
-@end tex
- built on virtual-memory
-OS primitives and multi-threaded parallelism.
-Its implementation has undergone various changes over its decade
-of development, many of which were driven by external needs to
-obtain a robust and fast database system.
-
-The coding scheme explored in GDK has also laid a foundation to
-communicate over time experiences and to provide (hopefully)
-helpful advice near to the place where the code-reader needs it.
-Of course, over such a long time the documentation diverges from
-reality. Especially in areas where the environment of this package
-is being described.
-Consider such deviations as historic landmarks, e.g. crystallization
-of brave ideas and mistakes rectified at a later stage.
-
-@+ Short Outline
-The facilities provided in this implementation are:
-@itemize
-@item
-GDK or Goblin Database Kernel routines for session management
-@item
- BAT routines that define the primitive operations on the
-database tables (BATs).
-@item
- BBP routines to manage the BAT Buffer Pool (BBP).
-@item
- ATOM routines to manipulate primitive types, define new types
-using an ADT interface.
-@item
- HEAP routines for manipulating heaps: linear spaces of memory
-that are GDK's vehicle of mass storage (on which BATs are built).
-@item
- DELTA routines to access inserted/deleted elements within a
-transaction.
-@item
- HASH routines for manipulating GDK's built-in linear-chained
-hash tables, for accelerating lookup searches on BATs.
-@item
- TM routines that provide basic transaction management primitives.
-@item
- TRG routines that provided active database support. [DEPRECATED]
-@item
- ALIGN routines that implement BAT alignment management.
-@end itemize
-
-The Binary Association Table (BAT) is the lowest level of storage
-considered in the Goblin runtime system
-@tex
-[@cite{Goblin}]
-@end tex
-. A BAT is a
-self-descriptive main-memory structure that represents the @strong{binary
-relationship} between two atomic types.
-The association can be defined over:
-@table @code
-@item void:
- virtual-OIDs: a densely ascending column of OIDs (takes zero-storage).
-@item bit:
- Booleans, implemented as one byte values.
-@item chr:
-A single character (8 bits @strong{integer}s).
-DEPRECATED for storing text (Unicode not supported).
-@item bte:
- Tiny (1-byte) integers (8-bit @strong{integer}s).
-@item sht:
- Short integers (16-bit @strong{integer}s).
-@item int:
- This is the C @strong{int} type (32-bit).
-@item oid:
- Unique @strong{long int} values uses as object identifier. Highest bit
cleared always.
- Thus, oids-s are 31-bit numbers on 32-bit systems, and 63-bit
numbers on 64-bit systems.
-@item wrd:
- Machine-word sized integers
- (32-bit on 32-bit systems, 64-bit on 64-bit systems).
-@item ptr:
-Memory pointer values. DEPRECATED. Can only be stored in transient BATs.
-@item flt:
- The IEEE @strong{float} type.
-@item dbl:
- The IEEE @strong{double} type.
-@item lng:
- Longs: the C @strong{long long} type (64-bit integers).
-@item str:
- UTF-8 strings (Unicode). A zero-terminated byte sequence.
-@item bat:
- Bat descriptor. This allows for recursive adminstered tables, but
-severely complicates transaction management. Therefore, they
-CAN ONLY BE STORED IN TRANSIENT BATs.
-@end table
-
-This model can be used as a back-end model underlying other -higher
-level- models, in order to achieve @strong{better performance} and
-@strong{data independence} in one go. The relational model and
-the object-oriented model can be mapped on BATs by vertically
-splitting every table (or class) for each attribute. Each such a
-column is then stored in a BAT with type @strong{bat[oid,attribute]}, where
-the unique object identifiers link tuples in the different BATs.
-Relationship attributes in the object-oriented model hence are
-mapped to @strong{bat[oid,oid]} tables, being equivalent to the concept of
-@emph{join indexes}
-@tex
-[@cite{Valduriez87}]
-@end tex
-.
-
-The set of built-in types can be extended with user-defined types
-through an ADT interface. They are linked with the kernel to obtain
-an enhanced library, or they are dynamically loaded upon request.
-
-Types can be derived from other types. They represent something different
-than that from which they are derived, but their internal storage management
-is equal. This feature facilitates the work of extension programmers, by
-enabling reuse of implementation code, but is also used to keep the GDK code
-portable from 32-bits to 64-bits machines: the @strong{oid} and @strong{ptr}
types
-are derived from @strong{int} on 32-bits machines, but is derived from
@strong{lng}
-on 64 bits machines. This requires changes in only two lines of code each.
-
-To accelerate lookup and search in BATs, GDK supports one built-in
-search accelerator: hash tables. We choose an implementation efficient
-for main-memory: bucket chained hash
-@tex
-[@cite{LehCar86,Analyti92}]
-@end tex
-. Alternatively, when the table is sorted, it will resort to merge-scan
-operations or binary lookups.
-
-BATs are built on the concept of heaps, which are large pieces of main
-memory. They can also consist of virtual memory, in case the working
-set exceeds main-memory. In this case, GDK supports operations that
-cluster the heaps of a BAT, in order to improve performance of its
-main-memory.
-
-
-@- Rationale
-The rationale for choosing a BAT as the building block for both
-relational and object-oriented system is based on the following
-observations:
-
-@itemize
-@item -
-Given the fact that CPU speed and main-memory increase in
-current workstation hardware for the last years has been exceeding
-IO access speed increase, traditional disk-page oriented algorithms
-do no longer take best advantage of hardware, in most database operations.
-
-Instead of having a disk-block oriented kernel with a large memory
-cache, we choose to build a main-memory kernel, that only under large data
-volumes slowly degrades to IO-bound performance, comparable to
-traditional systems
-@tex
-[@cite{boncz95,boncz96}]
-@end tex
-.
-
-@item -
-Traditional (disk-based) relational systems move too much data
-around to save on (main-memory) join operations.
-
-The fully decomposed store (DSM
-@tex
-[@cite{Copeland85})]
-@end tex
-assures that only those attributes of a relation that are needed,
-will have to be accessed.
-
-@item -
-The data management issues for a binary association is much
-easier to deal with than traditional @emph{struct}-based approaches
-encountered in relational systems.
-
-@item -
-Object-oriented systems often maintain a double cache, one with the
-disk-based representation and a C pointer-based main-memory structure.
-This causes expensive conversions and replicated storage management.
-GDK does not do such `pointer swizzling'. It used virtual-memory
-(@strong{mmap()}) and buffer management advice (@strong{madvise()}) OS
primitives to
-cache only once. Tables take the same form in memory as on disk,
-making the use of this technique transparent
-@tex
-[@cite{oo7}]
-@end tex
-.
-@end itemize
-
-A RDBMS or OODBMS based on BATs strongly depends on our ability
-to efficiently support tuples and to handle small joins, respectively.
-
-The remainder of this document describes the Goblin Database kernel
-implementation at greater detail. It is organized as follows:
-@table @code
-@item @strong{GDK Interface}:
-
-It describes the global interface with which GDK sessions can be
-started and ended, and environment variables used.
-
-@item @strong{Binary Association Tables}:
-
-As already mentioned, these are the primary data structure of GDK.
-This chapter describes the kernel operations for creation, destruction
-and basic manipulation of BATs and BUNs (i.e. tuples: Binary UNits).
-
-@item @strong{BAT Buffer Pool:}
-
-All BATs are registered in the BAT Buffer Pool. This directory is used
-to guide swapping in and out of BATs. Here we find routines that guide
-this swapping process.
-
-@item @strong{GDK Extensibility:}
-
-Atoms can be defined using a unified ADT interface.
-There is also an interface to extend the GDK library with
-dynamically linked object code.
-
-@item @strong{GDK Utilities:}
-
-Memory allocation and error handling primitives are provided. Layers
-built on top of GDK should use them, for proper system monitoring.
-Thread management is also included here.
-
-@item @strong{Transaction Management:}
-
-For the time being, we just provide BAT-grained concurrency and global
-transactions. Work is needed here.
-
-@item @strong{BAT Alignment:}
-Due to the mapping of multi-ary datamodels onto the BAT model,
-we expect many correspondences among BATs, e.g. @emph{bat(oid,attr1),..
-bat(oid,attrN)} vertical decompositions. Frequent activities will be
-to jump from one attribute to the other (`bunhopping'). If the head
-columns are equal lists in two BATs, merge or even array lookups
-can be used instead of hash lookups. The alignment interface makes
-these relations explicitly manageable.
-
-In GDK, complex data models are mapped with DSM on binary tables.
-Usually, one decomposes @emph{N}-ary relations into @emph{N} BATs with
-an @strong{oid} in the head column, and the attribute in the tail column.
-There may well be groups of tables that have the same sets of
-@strong{oid}s, equally ordered. The alignment interface is intended to make
-this explicit. Implementations can use this interface to detect this
-situation, and use cheaper algorithms (like merge-join, or even array
-lookup) instead.
-
-@item @strong{BAT Iterators:}
-
-Iterators are C macros that generally encapsulate a complex for-loop.
-They would be the equivalent of cursors in the SQL model. The macro
-interface (instead of a function call interface) is chosen to achieve
-speed when iterating main-memory tables.
-
-@item @strong{Common BAT Operations:}
-
-These are much used operations on BATs, such as aggregate functions
-and relational operators. They are implemented in terms of BAT- and
-BUN-manipulation GDK primitives.
-@end table
-
-@+ Interface Files
-In this section we summarize the user interface to the GDK library.
-It consist of a header file (gdk.h) and an object library (gdklib.a),
-which implements the required functionality. The header file must be
-included in any program that uses the library. The library must be
-linked with such a program.
-
-@- Database Context
-
-The MonetDB environment settings are collected in a configuration
-file. Amongst others it contains the location of the database
-directory.
-First, the database
-directory is closed for other servers running at the same time.
-Second, performance enhancements may take effect, such as locking
-the code into memory (if the OS permits) and preloading the
_______________________________________________
Checkin-list mailing list
[email protected]
http://mail.monetdb.org/mailman/listinfo/checkin-list