Dear Wiki user, You have subscribed to a wiki page or wiki category on "Cassandra Wiki" for change notification.
The "TopLevelPackages" page has been changed by daniels: https://wiki.apache.org/cassandra/TopLevelPackages?action=diff&rev1=1&rev2=2 === io === - This large package talks to the file system on behalf of C*. The bulk of this work consists of creating and using SSTables, which is the format that C* uses to store data. Other responsibilities include on-disk compression as well as some general-purpose I/O functionality for other code to use, including facilities for custom memory management. The main class here is SSTable, which represents an abstract persistent container of sorted data. SSTableWriter and SSTableReader are derived from SSTable and expose additional read and write functionality, and are used quite extensively by other packages (primarily db). + This large package talks to the file system on behalf of C*. The bulk of this work consists of creating and using `SSTables`, which is the format that C* uses to store data. Other responsibilities include on-disk compression as well as some general-purpose I/O functionality for other code to use, including facilities for custom memory management. The main class here is `SSTable`, which represents an abstract persistent container of sorted data. `SSTableWriter` and `SSTableReader` are derived from `SSTable` and expose additional read and write functionality, and are used quite extensively by other packages (primarily `db`). === db === - This huge package takes up almost a quarter of the entire codebase and implements the database engine. It operates in familiar database terms including Cells, Rows, ColumnFamilies (tables) and Keyspaces (databases). db heavily relies on io.SSTables for data persistence, but also reaches into many other packages for various tasks. Internally, db can be broken down into its own sublayers and also contains multiple subpackages for things such as data marshalling, commit log management, storage compaction and others. Overall, db is large and complicated enough to deserve an architectural study of its own. + This huge package takes up almost a quarter of the entire codebase and implements the database engine. It operates in familiar database terms including `Cells`, `Rows`, `ColumnFamilies` (tables) and `Keyspaces` (databases). `db` heavily relies on `io.SSTables` for data persistence, but also reaches into many other packages for various tasks. Internally, `db` can be broken down into its own sublayers and also contains multiple subpackages for things such as data marshalling, commit log management, storage compaction and others. Overall, `db` is large and complicated enough to deserve an architectural study of its own. === serializers === - This small package is subordinate to db, and contains utility methods for converting primitive types to byte buffers. + This small package is subordinate to `db`, and contains utility methods for converting primitive types to byte buffers. === notifications === - This tiny package contains interfaces and classes that allow other code to hook certain internal db events with custom code. It can be used for things such as unit tests, but may also be hooked into with other external functionality. + This tiny package contains interfaces and classes that allow other code to hook certain internal `db` events with custom code. It can be used for things such as unit tests, but may also be hooked into with other external functionality. === cache === - cache is a smaller package containing primitives used to implement key and row caching. The class that actually orchestrates all that caching activity (CacheService) lives in service, not cache. cache is primarily used for the benefit of db, but db almost never uses cache directly, instead proxying through service.CacheService. Without this extra indirection, cache could easily be structured as a subpackage under db. + `cache` is a smaller package containing primitives used to implement key and row caching. The class that actually orchestrates all that caching activity (`CacheService`) lives in service, not `cache`. `cache` is primarily used for the benefit of `db`, but `db` almost never uses `cache` directly, instead proxying through `service.CacheService`. Without this extra indirection, `cache` could easily be structured as a subpackage under `db`. === cql3 === - Read and write APIs provided by db are difficult to use directly, so C* provides a query language for easier access to the underlying data–Cassandra Query Language or CQL. The language is implemented in its own package cql3 (the third major release of the language). cql3 defines the language grammar and implements the QueryProcessor as well as all the Statements and related functionality available in CQL. It is interesting that cql3 is not a purely externally facing API; some internal code actually leverages it to store and retrieve system state information. In that respect, CQL is becoming a core component. + Read and write APIs provided by `db` are difficult to use directly, so C* provides a query language for easier access to the underlying data–Cassandra Query Language or CQL. The language is implemented in its own package `cql3` (the third major release of the language). `cql3` defines the language grammar and implements the `QueryProcessor` as well as all the `Statements` and related functionality available in CQL. It is interesting that `cql3` is not a purely externally facing API; some internal code actually leverages it to store and retrieve system state information. In that respect, CQL is becoming a core component. === net === - This package implements MessagingService, which abstracts away most networking machinery from the rest of the codebase. Other packages can then set up communication protocols represented by different Verbs and send custom Messages carrying those verbs to remote nodes. Verbs are handled on the receiving side with VerbHandlers. net provides only base types and functionality common to all messages; specialized implementation live in various other packages. + This package implements `MessagingService`, which abstracts away most networking machinery from the rest of the codebase. Other packages can then set up communication protocols represented by different `Verbs` and send custom `Messages` carrying those verbs to remote nodes. `Verbs` are handled on the receiving side with `VerbHandlers`. `net` provides only base types and functionality common to all messages; specialized implementation live in various other packages. === sink === - Used by net and service, this tiny package is used to hook into messaging events. This is primarily useful for unit tests. + Used by `net` and `service`, this tiny package is used to hook into messaging events. This is primarily useful for unit tests. === security === - This tiny package, currently containing only one class SSLFactory, is used to encrypt communication over the network. + This tiny package, currently containing only one class `SSLFactory`, is used to encrypt communication over the network. === gms === - gms (possibly standing for Gossip Message Service) implements the Gossiper. Gossiper is a peer-to-peer service that deals with disseminating cluster state information among member nodes. Gossiping consists of detecting unresponsive nodes using heart beat messages, and sharing liveness data among peers. + `gms` (possibly standing for Gossip Message Service) implements the `Gossiper`. `Gossiper` is a peer-to-peer service that deals with disseminating cluster state information among member nodes. Gossiping consists of detecting unresponsive nodes using heart beat messages, and sharing liveness data among peers. === locator === - locator is responsible for two separate tasks. One is discovering cluster topology through a pluggable component called Snitch. A few snitches are available out of the box, and some dynamic implementations heavily rely on Gossiper to detect up-to-date cluster topology. The second responsibility is deciding how to optimally distribute replicas based on discovered topology (handled by a class called ReplicationStrategy and its subclasses). + `locator` is responsible for two separate tasks. One is discovering cluster topology through a pluggable component called `Snitch`. A few snitches are available out of the box, and some dynamic implementations heavily rely on `Gossiper` to detect up-to-date cluster topology. The second responsibility is deciding how to optimally distribute replicas based on discovered topology (handled by a class called `ReplicationStrategy` and its subclasses). === streaming === - Another core networking package, streaming is responsible for moving bulk data between the nodes in the cluster. + Another core networking package, `streaming` is responsible for moving bulk data between the nodes in the cluster. === repair === - This smaller package deals with running RepairSessions, which redistribute data after a change in the cluster, or when corruption is detected in one of the existing nodes. Repair events are just one example where streaming is used. + This smaller package deals with running `RepairSessions`, which redistribute data after a change in the cluster, or when corruption is detected in one of the existing nodes. Repair events are just one example where `streaming` is used. === service === - service, although not the largest package, can be thought of as the skeleton upon which all other functionality builds. service consists of an executable class CassandraDaemon (this class contains the main() function of the C* daemon), along with a set of core services, including StorageProxy and StorageService. A lot of, if not most, of inter-package communication within the codebase is brokered through one of those two. StorageService is more involved in orchestrating Dynamo-level activities in the cluster, whereas StorageProxy is more focused on handling data transfer. + `service`, although not the largest package, can be thought of as the skeleton upon which all other functionality builds. `service` consists of an executable class `CassandraDaemon` (this class contains the `main()` function of the C* daemon), along with a set of core services, including `StorageProxy` and `StorageService`. A lot of, if not most, of inter-package communication within the codebase is brokered through one of those two. `StorageService` is more involved in orchestrating Dynamo-level activities in the cluster, whereas `StorageProxy` is more focused on handling data transfer. - service is critical to most other modules, many of which are free to call into it from arbitrary places. A traditional weakness of such omnipresent uberpackages is that they attract all sort of miscellaneous functionality that doesn’t seem to belong anywhere else, and service is no exception: expect to see a lot of random bits and pieces residing here. + `service` is critical to most other modules, many of which are free to call into it from arbitrary places. A traditional weakness of such omnipresent uberpackages is that they attract all sort of miscellaneous functionality that doesn’t seem to belong anywhere else, and `service` is no exception: expect to see a lot of random bits and pieces residing here. === config === - Another omnipackage seemingly accessible from anywhere, config is a repository for configurable settings as well as a static entry point into the data store (through a class Schema which contains a reference to all keyspaces residing in the local cluster). + Another omnipackage seemingly accessible from anywhere, `config` is a repository for configurable settings as well as a static entry point into the data store (through a class `Schema` which contains a reference to all keyspaces residing in the local cluster). === transport === - This is one of external API providers. transport implements a Server that listens for connecting clients that want to use C* Native protocol, which as of Cassandra 2.0 is the primary communication protocol both for external clients and within the cluster. + This is one of external API providers. `transport` implements a Server that listens for connecting clients that want to use C* Native protocol, which as of Cassandra 2.0 is the primary communication protocol both for external clients and within the cluster. === thrift === @@ -98, +98 @@ === tools === - This package contains the implementation of several administrative utilities shipped with C*, including the Node tool, tools for import and export, as well as several utilities for SSTable maintenance. All these tools are available under /bin in a typical distribution. + This package contains the implementation of several administrative utilities shipped with C*, including the `Node` tool, tools for import and export, as well as several utilities for `SSTable` maintenance. All these tools are available under `/bin` in a typical distribution. === dht === - dht (Distributed Hash Table) is a core support class that is responsible for partitioning data among the nodes in the cluster. It contains several pluggable implementations of AbstractPartitioner class that handles the mechanics of data partitioning. In addition it defines Range and Token, which are primitives used by other packages to work with partition key ranges. + `dht` (Distributed Hash Table) is a core support class that is responsible for partitioning data among the nodes in the cluster. It contains several pluggable implementations of `AbstractPartitioner` class that handles the mechanics of data partitioning. In addition it defines `Range` and `Token`, which are primitives used by other packages to work with partition key ranges. === utils === - This hefty package is a grab bag of miscellaneous classes typical to any software project, the proverbial "other" section. It is not the best place to look for architectural pillars, but it contains some clever code that is partially responsible for Cassandra’s impressive perf and reliability, including implementations of BloomFilter and MerkleTree among others. + This hefty package is a grab bag of miscellaneous classes typical to any software project, the proverbial "other" section. It is not the best place to look for architectural pillars, but it contains some clever code that is partially responsible for Cassandra’s impressive perf and reliability, including implementations of `BloomFilter` and `MerkleTree` among others. === concurrent === - concurrent deals with threading and thread pools. Interestingly, the few custom concurrency primitives that C* uses belong to a level 2 package under utils, and not to this package. + `concurrent` deals with threading and thread pools. Interestingly, the few custom concurrency primitives that C* uses belong to a level 2 package under `utils`, and not to this package. === exceptions === @@ -122, +122 @@ === tracing === - tracing implements support for request tracing, whereupon some or all requests to Cassandra will cause verbose logging to be output for the purposes of debugging or performance tuning. + `tracing` implements support for request tracing, whereupon some or all requests to Cassandra will cause verbose logging to be output for the purposes of debugging or performance tuning. === metrics === - This package allows collecting quantitative data about various aspects of C* operation. Metric data can be accessed through Node tool that ships with C*. Like tracing, this capability can be important for operational maintenance and troubleshooting. + This package allows collecting quantitative data about various aspects of C* operation. Metric data can be accessed through `Node` tool that ships with C*. Like tracing, this capability can be important for operational maintenance and troubleshooting. === auth === - auth enables support for authentication and authorization, providing a measure of access control for C* service. + `auth` enables support for authentication and authorization, providing a measure of access control for C* service. === cli === - cli implements a Command Line Interface client for interacting with a C* cluster from a remote node. + `cli` implements a Command Line Interface client for interacting with a C* cluster from a remote node. === hadoop === - hadoop exposes C* in terms of MapReduce/Pig primitives, allowing integration with Hadoop clients. This package is expected to be used as an adapter in external Hadoop applications; there is no code here that actually runs server side. + `hadoop` exposes C* in terms of MapReduce/Pig primitives, allowing integration with Hadoop clients. This package is expected to be used as an adapter in external Hadoop applications; there is no code here that actually runs server side. === client === - A tiny package that provides helper functionality to code that runs against C* on the client side. Only hadoop uses client out of the box. + A tiny package that provides helper functionality to code that runs against C* on the client side. Only `hadoop` uses `client` out of the box.
