[1/4] incubator-crail-website git commit: Updating documentation from the github README.md

atrivedi Mon, 22 Jan 2018 07:07:09 -0800

Repository: incubator-crail-website
Updated Branches:
  refs/heads/master 3621f1348 -> 3375ca7c2



Updating documentation from the github README.md

The old documentation that included many components
and old code references is not updated by the current
documentation from the github README.md


Project: http://git-wip-us.apache.org/repos/asf/incubator-crail-website/repo
Commit: 
http://git-wip-us.apache.org/repos/asf/incubator-crail-website/commit/d69b080c
Tree: 
http://git-wip-us.apache.org/repos/asf/incubator-crail-website/tree/d69b080c
Diff: 
http://git-wip-us.apache.org/repos/asf/incubator-crail-website/diff/d69b080c

Branch: refs/heads/master
Commit: d69b080cf6d2e773d24e7bf0a3ebf5ebeac762b3
Parents: 3621f13
Author: Animesh Trivedi <animesh.triv...@gmail.com>
Authored: Mon Jan 22 15:53:54 2018 +0100
Committer: Animesh Trivedi <animesh.triv...@gmail.com>
Committed: Mon Jan 22 15:53:54 2018 +0100

----------------------------------------------------------------------
 site/documentation/index.md | 137 ++++++++++++++++++---------------------
 1 file changed, 64 insertions(+), 73 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-crail-website/blob/d69b080c/site/documentation/index.md
----------------------------------------------------------------------
diff --git a/site/documentation/index.md b/site/documentation/index.md
index 524a316..79053d6 100644
--- a/site/documentation/index.md
+++ b/site/documentation/index.md
@@ -3,30 +3,35 @@ layout: default
 title: Documentation 
 ---
 
-The Crail I/O stack consists of a set of components. Typically only a subset 
of the components are required for a particular use case (e.g., Spark, Hadoop, 
Hive, etc.) or hardware setup (e.g., RDMA, TCP, Flash, etc.). Here is a list of 
the components together with their GitHub repository. 
+Apache Crail (Incubating) is a fast multi-tiered distributed storage system 
designed from ground up for high-performance network and storage hardware. The 
unique features of Crail include:
 
-* <a href="{{ site.base }}/community/">Crail Store</a>: The backbone for all 
I/O operations across distributed storage resource. Includes both the RDMA/DRAM 
and the NVMf/Flash storage tier.
-* [Crail-Blkdev](https://github.com/zrlio/crail-blkdev): A Crail storage tier 
for shared volume storage.
-* [Crail-Netty](https://github.com/zrlio/crail-netty): A Crail TCP/DRAM 
storage tier built on top of Netty.
-* [Crail-Spark-IO](https://github.com/zrlio/crail-spark-io): A module 
including Crail-based Shuffle and Broadcast plugins for Spark.
-* [Crail-Spark-TeraSort](https://github.com/zrlio/crail-terasort): Currently 
only the sorting benchmark is available.
+* Zero-copy network access from userspace 
+* Integration of multiple storage tiers such DRAM, flash and disaggregated 
shared storage
+* Ultra-low latencies for both meta data and data operations. For instance: 
opening, reading and closing a small file residing in the distributed DRAM tier 
less than 10 microseconds, which is in the same ballpark as some of the fastest 
RDMA-based key/value stores
+* High-performance sequential read/write operations: For instance: read 
operations on large files residing in the distributed DRAM tier are typically 
limited only by the performance of the network
+* Very low CPU consumption: a single core sharing both application and file 
system client can drive sequential read/write operations at the speed of up to 
100Gbps and more
+* Asynchronous API leveraging the asynchronous nature of RDMA-based networking 
hardware
+* Extensible plugin architecture: new storage tiers tailored to specific 
hardware can be added easily
+ 
+Crail is implemented in Java offering a Java API which integrates directly 
with the Java off-heap memory. Crail is designed for performance critical 
temporary data within a scope of a rack or two. 
 
-We currently do not provide binary releases. This page describes how to build 
the Crail I/O stack from source, and how to configure and deploy it. 
+## Requirements
 
-<h2 id="crail">Building Crail Store</h2>
+* Java 8 or higher
+* RDMA-based network, e.g., Infiniband, iWARP, RoCE. There are two options to 
run Crail without RDMA networking hardware: (a) use SoftiWARP, (b) us the 
TCP/DRAM storage tier
+* Libdisni.so, available as part of [DiSNI](https://github.com/zrlio/disni)
 
-Building the source requires [Apache Maven](http://maven.apache.org/) and Java 
version 8 or higher.
-To build Crail execute the following steps:
+## Building 
 
-1. Obtain a copy of <a href="{{ site.base }}/community/">Crail Store</a>
-2. Make sure your local maven repo contains 
[DiSNI](https://github.com/zrlio/disni), if not build DiSNI from Github
-3. Make sure your local maven repo contains 
[DaRPC](https://github.com/zrlio/darpc), if not build DaRPC from Github
-4. Run: mvn -DskipTests install
-5. Copy tarball to the cluster and unpack it using tar xvfz 
crail-1.0-bin.tar.gz
+To build Crail from source using [Apache Maven](http://maven.apache.org/) 
execute the following steps:
+
+1. Obtain a copy of [Crail](https://github.com/apache/incubator-crail) from 
Github
+2. Run: mvn -DskipTests install
+3. Copy tarball to the cluster and unpack it using tar xvfz 
crail-1.0-bin.tar.gz
 
 Note: later, when deploying Crail, make sure libdisni.so is part of your 
LD_LIBRARY_PATH. The easiest way to make it work is to copy libdisni.so into 
crail-1.0/lib 
 
-### Configuration
+## Configuration
 
 To configure Crail use crail-site.conf.template as a basis and modify it to 
match your environment. 
 
@@ -37,52 +42,53 @@ There are a general file system properties and specific 
properties for the diffe
 
     crail.namenode.address                crail://namenode:9060
     crail.storage.types                   
org.apache.crail.storage.rdma.RdmaStorageTier
-    crail.cachepath                       /memory/cache
+    crail.cachepath                       /dev/hugepages/cache
     crail.cachelimit                      12884901888
     crail.blocksize                       1048576
     crail.buffersize                      1048576
 
-In this configuration the namenode is configured to run using port 9060 on 
host 'namenode', which must be a valid host in the cluster. We further 
configure a single storage tier, in this case the RDMA-based DRAM tier. 
Cachepath points to a directory that is used by the file system to allocate 
memory for the client cache. Up to cachelimit size, all the memory that is used 
by Crail will be allocated via mmap from this location. Ideally, the directory 
specified in cachepath points to a hugetlbfs mountpoint. Aside from the general 
properties, each storage tier needs to be configured separately.
+In this configuration the namenode is configured to run using port 9060 on 
host 'namenode', which must be a valid host in the cluster. We further 
configure a single storage tier, in this case the RDMA-based DRAM tier. The 
cachepath property needs to point to a directory that is used by the file 
system to allocate memory for the client cache. Up to cachelimit size, all the 
memory that is used by Crail will be allocated via mmap from this location. 
Ideally, the directory specified in cachepath points to a hugetlbfs mountpoint. 
Aside from the general properties, each storage tier needs to be configured 
separately.
 
-#### RDMA/DRAM Storage Tier
+### RDMA/DRAM Storage
 
 For the RDMA/DRAM tier we need to specify the interface that should be used by 
the storage nodes.
 
-    crail.storage.rdma.interface          eth0
+    crail.storage.rdma.interface         eth0
   
 The datapath property specifies a path from which the storage nodes will 
allocate blocks of memory via mmap. Again, that path best points to a hugetlbfs 
mountpoint.
 
-    crail.storage.rdma.datapath           /memory/data
+    crail.storage.rdma.datapath          /memory/data
 
 You want to specify how much DRAM each datanode should donate into the file 
system pool using the `storagelimit` property. DRAM is allocated in chunks of 
`allocationsize`, which needs to be a multiple of `crail.blocksize`.
 
-    crail.storage.rdma.allocationsize     1073741824
-    crail.storage.rdma.storagelimit       75161927680
+    crail.storage.rdma.allocationsize    1073741824
+    crail.storage.rdma.storagelimit      75161927680
 
 Crail supports optimized local operations via memcpy (instead of RDMA) in case 
a given file operation is backed by a local storage node. The indexpath 
specifies where Crail will store the necessary metadata that make these 
optimizations possible. Important: the indexpath must NOT point to a hugetlbfs 
mountpoint because index files will be updated which not possible in hugetlbfs.
 
-    crail.storage.rdma.localmap           true
-    crail.storage.rdma.indexpath          /index
+    crail.storage.rdma.localmap          true
+    crail.storage.rdma.indexpath         /index
     
-#### NVMf/Flash Storage Tier
+### NVMf/Flash Storage    
 
 Crail is a multi-tiered storage system. Additinoal tiers can be enabled by 
adding them to the configuration as follows.
 
     crail.storage.types                  
org.apache.crail.storage.rdma.RdmaStorageTier,org.apache.crail.storage.nvmf.NvmfStorageTier
 
-
 For the NVMf storage tier we need to configure the server IP that is used when 
listening for new connections. We also need to configure the PCI address of the 
flash device we want to use, as well as the huge page mount point to be used 
for allocating memory. 
 
-    crail.storage.nvmf.bindip          10.40.0.XX
-    crail.storage.nvmf.pcieaddr                0000:11:00.0
-    crail.storage.nvmf.hugedir         /dev/hugepages
-    crail.storage.nvmf.socketmem               512,512
+    crail.storage.nvmf.bindip           10.40.0.XX
+    crail.storage.nvmf.pcieaddr         0000:11:00.0
+    crail.storage.nvmf.hugedir          /dev/hugepages
+    crail.storage.nvmf.servermempool    512
+    crail.storage.nvmf.clientmempool    512
 
-### Deployment
+
+## Deploying
 
 For all deployments, make sure you define CRAIL_HOME on each machine to point 
to the top level Crail directory.
 
-#### Starting Crail manually
+### Starting Crail manually
 
 The simplest way to run Crail is to start it manually on just a handful nodes. 
You will need to start the Crail namenode, plus at least one datanode. To start 
the namenode execute the following command on the host that is configured to be 
the namenode:
 
@@ -99,7 +105,7 @@ Now you should have a small deployment up with just one 
datanode. In this case t
 
 This would start the shared storage datanode. Note that configuration in 
crail-site.conf needs to have the specific properties set of this type of 
datanode, in order for this to work. 
 
-#### Larger deployments
+### Larger deployments
 
 To run larger deployments start Crail using 
 
@@ -117,7 +123,7 @@ For this to work include the list of machines to start 
datanodes in conf/slaves.
 
 In this example, we are configuring a Crail cluster with 2 physical hosts but 
3 datanodes and two different storage tiers.
 
-### Crail Shell
+## Crail Shell
 
 Crail provides an contains an HDFS adaptor, thus, you can interact with Crail 
using the HDFS shell:
 
@@ -149,7 +155,7 @@ For the Crail shell to work properly, the HDFS 
configuration in crail-1.0/conf/c
 
 Note that the Crail HDFS interface currently cannot provide the full 
performance of Crail due to limitations of the HDFS API. In particular, the 
HDFS `FSDataOutputStream` API only support heap-based `byte[]` arrays which 
requires a data copy. Moreover, HDFS operations are synchronous preventing 
efficient pipelining of operations. Instead, applications that seek the best 
performance should use the Crail interface directly, as shown next.
 
-### Programming against Crail
+## Programming against Crail
 
 The best way to program against Crail is to use Maven. Make sure you have the 
Crail dependency specified in your application pom.xml file:
 
@@ -159,20 +165,20 @@ The best way to program against Crail is to use Maven. 
Make sure you have the Cr
       <version>1.0</version>
     </dependency>
 
-Then, create a Crail file system instance as follows:
+Then, create a Crail client as follows:
 
     CrailConfiguration conf = new CrailConfiguration();
-    CrailFS fs = CrailFS.newInstance(conf);
+    CrailStore store = CrailStore.newInstance(conf);
 
 Make sure the crail-1.0/conf directory is part of the classpath. 
 
-The simplest way to create a file in Crail is as follows:
+Crail supports different file types. The simplest way to create a file in 
Crail is as follows:
 
-    CrailFile file = fs.create(filename, CrailNodeType.DATAFILE, 
CrailStorageClass.DEFAULT, CrailLocationClass.DEFAULT).get().syncDir();
+    CrailFile file = store.create(filename, CrailNodeType.DATAFILE, 
CrailStorageClass.DEFAULT, CrailLocationClass.DEFAULT).get().syncDir();
 
 Aside from the actual filename, the 'create()' call takes as input the storage 
and location classes which are preferences for the storage tier and physical 
location that this file should be created in. Crail tries to satisfy these 
preferences later when the file is written. In the example we do not request 
any particular storage or location affinity.
 
-The 'create()' call is non-blocking, calling 'get()' on the returning future 
object awaits the completion of the call. At that time, the file has been 
created, but its directory entry may not be visible. Therefore, the file may 
not yet show up in a file enumeration of the given parent directory. Calling 
'syncDir()' waits to for the directory entry to be completed. Both the 'get()' 
and the 'syncDir()' operation can be deffered to a later time at which they may 
become non-blocking operations. 
+This 'create()' command is non-blocking, calling 'get()' on the returning 
future object awaits the completion of the call. At that time, the file has 
been created, but its directory entry may not be visible. Therefore, the file 
may not yet show up in a file enumeration of the given parent directory. 
Calling 'syncDir()' waits to for the directory entry to be completed. Both the 
'get()' and the 'syncDir()' operation can be deffered to a later time at which 
they may become non-blocking operations. 
 
 Once the file is created, a file stream can be obtained for writing:
 
@@ -186,21 +192,24 @@ In both cases, we pass a write hint (1024 in the example) 
that indicates to Crai
 
 Once the stream has been obtained, there exist various ways to write a file. 
The code snippet below shows the use of the asynchronous interface:
 
-    ByteBuffer dataBuf = fs.allocateBuffer();
+    CrailBuffer dataBuf = fs.allocateBuffer();
     Future<DataResult> future = outputStream.write(dataBuf);
     ...
     future.get();
 
 Reading files works very similar to writing. There exist various examples in 
org.apache.crail.tools.CrailBenchmark.
 
-### Storage Tiers
+## TCP Storage Tiers and RPC binding
+
+Crail is designed for user-level networking and storage. It does, however, 
also provide plain TCP-based storage backends for storage and RPC and, thus, 
can be run easily on any machine without requiring spspecial hardware support. 
The TCP storage backend can be enabled as follows:
 
-Crail ships with the RDMA/DRAM storage tier. Currently there are two 
additional storage tiers available in separate repos:
+    crail.storage.types                
org.apache.crail.storage.tcp.TcpStorageTier
 
-* [Crail-Blkdev](https://github.com/zrlio/crail-blkdev)  is a storage tier 
integrating shared volume block devices such as disaggregated flash. 
-* [Crail-Netty](https://github.com/zrlio/crail-netty) is a DRAM storage tier 
for Crail that uses TCP, you can use it to run Crail on non-RDMA hardware. 
Follow the instructions in these repos to build, deploy and use these storage 
tiers in your Crail environmnet. 
+The TCP RPC binding can be enabled as follows:
 
-### Benchmarks
+    crail.namenode.rpctype     org.apache.crail.namenode.rpc.tcp.TcpNameNode
+
+## Benchmarks
 
 Crail provides a set of benchmark tools to measure the performance. Type
 
@@ -220,35 +229,17 @@ This command issues 102400 read operations of 1MB each.
 
 The tool also contains benchmarks to read files randomly, or to measure the 
performance of opening files, etc.
 
-<h2 id="spark">Building Crail Spark Modules</h2>
-
-Building the source requires [Apache Maven](http://maven.apache.org/) and Java 
version 8 or higher.
-To build Crail execute the following steps:
-
-1. Obtain a copy of [Crail-Spark-IO](https://github.com/zrlio/crail-spark-io) 
from Github
-2. Make sure your local maven repo contains crail store jars, if not build 
Crail from the <a href="{{ site.base }}/community/">source</a> 
-4. Run: mvn -DskipTests install
-5. Add crail-spark-1.0.jar as well as its Crail dependencies to the Spark 
extra class path, both for the driver and the executors
-
-```
-spark.driver.extraClassPath     $CRAIL_HOME/jars/*:<path>/crail-spark.jar:.
-spark.executor.extraClassPath   $CRAIL_HOME/jars/*:<path>/crail-spark.jar:.
-```
+## Applications
 
-### Configuration
+Crail is used by [Crail-Spark-IO](https://github.com/zrlio/crail-spark-io), a 
high-performance shuffle engine for Spark. 
[Crail-Terasort](https://github.com/zrlio/crail-terasort) is a fast sorting 
benchmark for Spark based on Crail. 
 
-To configure the crail shuffle plugin included in spark-io add the following 
line to spark-defaults.conf
-```
-spark.shuffle.manager          
org.apache.spark.shuffle.crail.CrailShuffleManager
-```
-Since spark version 2.0.0, broadcast is no longer an exchangeable plugin, 
unfortunately. To use the crail broadcast plugin in Spark it has to be manually 
added to Spark's BroadcastManager.scala.
+## Contributions
 
-### Running
+PRs are always welcome. Please fork, and make necessary modifications 
+you propose, and let us know. 
 
-For the Crail shuffler to perform best, applications are encouraged to provide 
an implementation of the `CrailShuffleSerializer` interface, as well as an 
implementation of the `CrailShuffleSorter` interface. Defining its own custom 
serializer and sorter for the shuffle phase not only allows the application to 
serialize and sort faster, but allows applications to directly leverage the 
functionality provided by the Crail input/output streams such as zero-copy or 
asynchronous operations. Custom serializer and sorter can be specified in 
spark-defaults.xml. For instance, 
[crail-terasort](https://github.com/zrlio/crail-terasort) defines the shuffle 
serializer and sorter as follows:
+## Contact 
 
-```
-spark.crail.shuffle.sorter     
com.ibm.crail.terasort.sorter.CrailShuffleNativeRadixSorter
-spark.crail.shuffle.serializer com.ibm.crail.terasort.serializer.F22Serializer
-```
+Please join the Crail developer mailing list for discussions and 
notifications. The list is at: 
 
+d...@crail.incubator.apache.org.

[1/4] incubator-crail-website git commit: Updating documentation from the github README.md

Reply via email to