[Cassandra Wiki] Update of "LargeDataSetConsiderations_JP" by MakiWatanabe

Apache Wiki Thu, 10 Mar 2011 16:07:54 -0800

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Cassandra Wiki" for 
change notification.


The "LargeDataSetConsiderations_JP" page has been changed by MakiWatanabe.
The comment on this change is: Translation to Japanese completed.
http://wiki.apache.org/cassandra/LargeDataSetConsiderations_JP?action=diff&rev1=22&rev2=23

--------------------------------------------------

  
    * 
将来的な改善方法については以下のリンクで議論されています:[[https://issues.apache.org/jira/browse/CASSANDRA-1876|CASSANDRA-1876]],
 [[https://issues.apache.org/jira/browse/CASSANDRA-1881|CASSANDRA-1881]]
  
-  * 
ファイルシステムの選択について:巨大なファイルの削除は、例えばext2/ext3では恐ろしく遅く、多量のseekを要します。xfsまたはext4fsの使用を検討してください。これは恒常的に生じるsstableのバックグラウンドでのunlinkに影響します。また起動速度にも影響します。
+  * 
ファイルシステムの選択について:巨大なファイルの削除は、例えばext2/ext3では恐ろしく遅く、多量のseekを要します。xfsまたはext4fsの使用を検討してください。これは恒常的に生じるsstableのバックグラウンドでのunlinkに影響します。また起動速度にも影響します。(起動時に削除待機中のsstableがある場合、起動プロセスの一環としてそれらを削除します。これは数TBのsstableを削除するような場合は有害でしょう。)
- 
(起動時に削除待機中のsstableがある場合、起動プロセスの一環としてそれらを削除します。従って数TBのsstableを削除するような場合は有害になるでしょう。)
  
-  * 各ノードが多量のデータを格納している場合、ノードの追加には時間がかかります。
+  * 各ノードが多量のデータを格納している場合、ノードの追加には時間がかかります。システムがぎりぎりまで逼迫する前にノードを増設した方がいいでしょう。
  
- Plan for this; do not try to throw additional hardware at a cluster at the 
last minute.
- システムがぎりぎりまで逼迫してからノードを追加するのは避けたほうがいいでしょう。
+  * Cassandraは起動時にstable 
indexファイルを読みます。これは「indexサンプリング」と呼ばれています。indexサンプリングによりキーのサブセット（デフォルトでは100分の1）とディスク上の位置がメモリ上のインデックスに保持されます（[[ArchitectureInternals]]参照）。これはインデックスファイルが大きくなるにつれ、サンプリングに要する時間が長くなることを意味します。従って、キーを多量に含む巨大なインデックスが存在する場合、起動時のindexサンプリングが問題になる可能性があります。
+  * 
行キャッシュを大きくした場合のデメリットは起動に要する時間です。行キャッシュの情報を定期的に保存する際には、キャッシュされているキー値のみが保存されます。キーに対応するデータは起動時にプリフェッチされていなければなりません。巨大なデータセットではこれには多量のseekを要し、行キャッシュが使用可能になるまでの時間は行キャッシュサイズに比例します。（seek
 IOがディスク最適化の影響を受けておらず、かつ十分に大きなデータセットを扱う場合）
+   * 将来的な改善方法については以下のリンクで議論されています: 
[[https://issues.apache.org/jira/browse/CASSANDRA-1625|CASSANDRA-1625]]
  
-  * Cassandra will read through sstable index files on start-up, doing what is 
known as "index sampling". This is used to keep a subset (currently and by 
default, 1 out of 100) of keys and and their on-disk location in the index, in 
memory. See [[ArchitectureInternals]]. This means that the larger the index 
files are, the longer it takes to perform this sampling. Thus, for very large 
indexes (typically when you have a very large number of keys) the index 
sampling on start-up may be a significant issue.
-  * A negative side-effect of a large row-cache is start-up time. The periodic 
saving of the row cache information only saves the keys that are cached; the 
data has to be pre-fetched on start-up. On a large data set, this is probably 
going to be seek-bound and the time it takes to warm up the row cache will be 
linear with respect to the row cache size (assuming sufficiently large amounts 
of data that the seek bound I/O is not subject to optimization by disks).
-   * Potential future improvement: 
[[https://issues.apache.org/jira/browse/CASSANDRA-1625|CASSANDRA-1625]].
-

[Cassandra Wiki] Update of "LargeDataSetConsiderations_JP" by MakiWatanabe

Reply via email to