[Cassandra Wiki] Update of "MemtableThresholds_JP" by MakiWatanabe

Apache Wiki Fri, 11 Mar 2011 07:02:16 -0800

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Cassandra Wiki" for 
change notification.


The "MemtableThresholds_JP" page has been changed by MakiWatanabe.
The comment on this change is: Update Translation for 0.7+.
http://wiki.apache.org/cassandra/MemtableThresholds_JP?action=diff&rev1=13&rev2=14

--------------------------------------------------

  大まかには、Cassandraの内部データ構造が必要とするメモリサイズは次の数式で求められます。
  {{{memtable_throughput_in_mb * 3 * number of hot CFs + 1G + internal caches}}}
  
- Also know that if you're running up against the heap limit under load that's 
probably a symptom of other problems. Diagnose those first.
  また負荷がかかった状態でヒープサイズの上限に達した場合でも、それは真の問題から派生する症状である可能性もあります。はじめによく原因を分析してください。
  
  == 仮想メモリとスワップ ==
@@ -31, +30 @@

  
write操作を実行すると、CassandraはデータをMemtablesと呼ばれる、カラムファミリ固有のメモリ内データ構造に格納します。Memtablesは設定可能ないくつかの閾値のいずれかを超えた時点でディスクにフラッシュされます。
  
初期設定値（64mb/0.3）は意図的に保守的です。メモリ不足によるノード停止を避けるためにはデータを、これらのパラメータを適切にチューニングすることが重要です。
  
- == Configuring Thresholds ==
+ == 閾値の設定 ==
  
- '''Larger ''''''Memtables take memory away from caches:'''
+ 
'''大きなMemtableはキャッシュに使用できるメモリを減らす:'''Memtableは実カラムの値を格納するため、少なくとも挿入されるデータと同じサイズのメモリを消費します。実際にはこのデータのインデックスを作成するためのデータ構造のオーバーヘッド分も必要です。値のサイズに比べてカラムや行の数が多い場合、インデックスのオーバーヘッドが無視できなくなるかもしれません。データそのものよりも大きくなることもあり得ます。
  
+ 
言い換えると、どの閾値にどのような値を設定するかは、あなたが使用できるメモリ量から単純な関数では導出できず、カラムファミリの数、カラムファミリごとのカラムの数、ソートされる値のサイズを考慮する必要があります。
- Since Memtables are storing actual column values, they consume at least as 
much memory as the size of data inserted. However, there is also overhead  
associated with the structures used to index this data. When the number of 
columns and rows is high compared to the size of values, this overhead can 
become quite significant, (possibly greater than the data itself).
-   In other words, which threshold(s) to use, and what to set them to is not 
just a function of how much memory you have, but of how many column families, 
how many columns per column-family, and the size of values  being stored.
  
- '''Larger Memtables don't improve write performance: '''Increasing the 
memtable capacity will cause less-frequent flushes but doesn't improve write 
performance directly: writes go directly to memory regardless. (Actually, if 
your commitlog and sstables share a volume they might contend, so if at all 
possible, put them on separate volumes)
+ 
'''Membtableを大きくしてもwrite性能は向上しない:'''memtableのサイズを大きくするとディスクへのフラッシュ頻度を下げることができますが、write性能の向上には直接には貢献しません。memtableのサイズに依らず、writeされたデータは直接メモリに格納されます。（もしcommitlogとsstableが同じボリュームを共有していたらmemtableの大きさがwriteに影響を与えます。従って、可能であればそれらは別のボリュームに配置すべきです。）
  
- '''Larger memtables do absorb more overwrites''': If your write load sees 
some rows written more often than others (eg upvotes of a front-page story) a 
larger memtable will absorb those overwrites, creating more efficient sstables 
and thus better read performance.  If your write load is batch oriented or if 
you have a massive row set, rows are not likely to be rewritten for a long 
time, and so this benefit will pay a smaller dividend.
+ 
'''Memtableを大きくすると、より上書きを吸収する:'''あなたのシステムのwrite負荷が少数の行を頻繁にアクセスする場合（例：Webの記事に対する投票など）、大きなmemtableはより多くの書き換え操作を吸収できるため、sstableがより効率的に生成され、read性能の向上に寄与します。write負荷がバッチ中心であるか、もしくはデータセットが非常に多くの行を含んでいる場合、ほとんどの行はあまり更新されないため、この効果はあまり期待できません。
  
- '''Larger memtables do lead to more effective compaction''': Since compaction 
is tiered, large sstables are preferable: turning over tons of tiny memtables 
is bad. Again, this impacts read performance (by improving the overall 
io-contention weather), but not writes.
+ '''Memtableを大きくするとcompactionが効率的になる:''' 
compactionは階層化されているため、sstableは大きい方が望ましいと言えます。即ち、多数の小さなmemtableは好ましくありません。これもread性能の向上に寄与しますが、writeには影響しません。
  
- Listed below are the thresholds found in `storage-conf.xml`, along with a 
description.
+ 以下に`storage-conf.xml`に現れる閾値と、その解説を示します。
  
  === MemtableThroughputInMB ===
- As the name indicates, this sets the max size in megabytes that the  Memtable 
will store before triggering a threshold violation and causing it to be flushed 
to disk. It corresponds to the size of the values inserted, (plus the size of 
the containing column).
  
- If left unconfigured (missing from the config), this defaults to 128MB.
+ 名前が示すように、このパラメータはMemtableのディスクへのフラッシュが発生するまでに格納できる最大のデータ量をMB単位で指定します。
+ これは挿入される値のサイズとその値を含むカラムのサイズに相当します。
  
- ''Note: This was referred to as MemtableSizeInMB in versons of Casandra 
before 0.6.0. In version 0.7b2+, the value will be applied on a 
[[https://issues.apache.org/jira/browse/CASSANDRA-1007|per column-family 
basis]].''
+ この値が設定されていない場合、デフォルト値は128MBです。
+ 
+ ''注意: このパラメータはversion 0.6.0以前ではMemtableSizeInMBと呼ばれていました。version 
0.7b2以上では設定はカラムファミリ単位に適用されます[[https://issues.apache.org/jira/browse/CASSANDRA-1007|CASSANDRA-1007]]。''
  
  === MemtableOperationsInMillions ===
- This directive sets a threshold on the number of columns stored.
+ このパラメータは格納されるカラムの数に関する閾値を設定します。
  
- Left unconfigured (missing from the config), this defaults to 0.1  (or 
100,000 objects). The config file's inital setting of 0.3 (or 300,000 objects) 
is a conservative starting point.
+ 
設定されていない場合、デフォルト値は0.1（100,000オブジェクト）です。設定ファイルの初期値は0.3（300,000オブジェクト）であり、控えめな値と言っていいでしょう。
  
+ ''注意: このパラメータはversion 0.6.0以前ではMemtableObjectCountInMillionsと呼ばれていました。version 
0.7b2以上では設定はカラムファミリ単位に適用されます[[https://issues.apache.org/jira/browse/CASSANDRA-1007|CASSANDRA-1007]]。''
- ''Note: This was referred to as MemtableObjectCountInMillions in versons of 
Casandra before 0.6.0. In version 0.7b2+, the value will be applied on a 
[[https://issues.apache.org/jira/browse/CASSANDRA-1007|per column-family 
basis]].''
- 
- == Using Jconsole To Optimize Thresholds ==
- Cassandra's column-family mbeans have a number of attributes that can prove 
invaluable in determining optimal thresholds. One way to access this 
instrumentation is by using Jconsole, a graphical monitoring and management 
application that ships with your JDK.
- 
- Launching Jconsole with no arguments will display the "New Connection" dialog 
box. If you are running Jconsole on the same machine that  Cassandra is running 
on, then you can connect using the PID, otherwise you will need to connect 
remotely. The default startup scripts for Cassandra cause the VM to listen on 
port 8080 using the JVM option:
- 
- . -Dcom.sun.management.jmxremote.port=8080
- 
- The remote JMX url is then:
- 
- service:jmx:rmi:///jndi/rmi://localhost:8080/jmxrmi
- 
- This is used internally by: bin/nodetool 
src/java/org/apache/cassandra/tools/nodetool.java
- 
- {{attachment:jconsole_connect.png}}
- 
- Once connected, select the ''MBeans'' tab, expand the  
''org.apache.cassandra.db'' section, and finally one of your column families.
- 
- There are three interesting attributes here.
- 
- 1. ''!MemtableColumnsCount'', representing the total number of column entries 
in this table. If you store 100 rows that each have 100 columns, expect to see 
this value increase by 10,000. This attribute is useful in setting the 
[[#MemtableObjectCountInMillions|MemtableObjectCountInMillions]] threshold.
- 1. ''!MemtableDataSize'', which is used to determine the total size of stored 
data. This is the sum of all the values stored and does not account for 
Memtable overhead, (i.e. it's not indicative of the actual memory used by the 
Memtable). Use this value when adjusting [[#MemtableSizeInMB|MemtableSizeInMB]].
- 1. Finally there is ''!MemtableSwitchCount'' which increases by one each time 
a column family flushes its Memtable to disk.
- 
- ''Note: You'll need to manually mash the `Refresh` button to update these 
values.''
- 
- {{attachment:jconsole_attributes.png}}
- 
- It is also possible to schedule an immediate flush using the `forceFlush()` 
operation.
- 
- {{attachment:jconsole_operations.png}}
  
  
- 
- 
- 
- 
- 
- 
- ## old translation ##
- = 過去の翻訳 =
- 
- 
書き込み操作を行うと、Cassandraはメモリ上の!ColumnFamilyのデータ構造であるMemtableに値を保存します。Memtableは設定された閾値を超えるとディスクへとフラッシュされます。利用可能なシステムメモリを有効活用し、メモリ不足でノードが落ちてしまわないよう、閾値の正しいチューニングを行うことは重要です。
- 
- 
(bin/cassandra.in.shのデフォルト設定では、最大JVMヒープサイズは1GB(-Xmx1G)となっており、本番環境では少なすぎます。この値を増やすことも考慮すべきです。)
- 
- == 閾値の設定 ==
- 
Memtableは実際のカラムの値を保持しているので、挿入されたデータのサイズ分のメモリを最低でも消費します。しかしこのデータを索引付けするのに使用される構造に紐づいたオーバーヘッドも有ります。値のサイズに比べてカラム数と行数が多い場合、このオーバーヘッドは無視できないくらい顕著になります。(データそのものよりも大きいかもしれません。)
- 
- 
言い換えると、どの閾値を使うか、閾値にどんな値をセットするかは、どれくらいのメモリがあるかによってではなく、!ColumnFamilyがいくつあって、それぞれの!ColumnFamilyにいくつカラムがあるのか、そして保存されている値のサイズのによります。
- 
- 以下に示しているのは、`sotrage-conf.xml`にある閾値に関する設定項目とその説明です。
- 
- === MemtableSizeInMB ===
- 
名前が示す通り、この項目はMemtableがディスクにフラッシュされる前の最大のサイズをMB単位で設定します。挿入された値のサイズに対応します。(格納しているカラムのサイズがプラスされます。)
- 
- 設定されない場合(設定ファイルに記述がない場合)、デフォルトでは128MBです。
- 
- ''メモ: !ColumnFamily単位で設定されます。''
- 
- === MemtableObjectCountInMillions ===
- この項目は保存されるカラム数の閾値を設定します。
- 
- 設定されない場合(設定ファイルに記述がない場合)、デフォルトで1(1,000,000オブジェクト)です。
- 
- ''メモ: !ColumnFamily単位で設定されます。''
- 
  == JConsoleを使用した閾値の最適化 ==
- Cassandraの!ColumnFamily 
MBeanはいくつもの属性を持っています。最適な閾値を決めるための...。この機能にアクセスするひとつの方法はJConsoleを使用することです。JConsoleはグラフィカルなモニタリングと管理のためのアプリケーションで、JDKに含まれています。
+ Cassandraの!ColumnFamily 
MBeanはいくつもの属性を持っています。それらは最適な閾値を決定するための貴重な情報です。この機能にアクセスするひとつの方法はJConsoleを使用することです。JConsoleはグラフィカルなモニタリングと管理のためのアプリケーションで、JDKに含まれています。
  
  JConsoleを引数なしで起動すると、"New 
Connection"ダイアログボックスが表示されます。JConsoleをCassandraが起動しているマシンと同じマシンで起動した場合、PIDを用いて接続することができます。その他の場合、リモートに接続する必要があります。デフォルトのCassandra起動スクリプトは以下のJVMオプションで、JavaVMにポート8080で受け付けるように設定します。
  
-  -Dcom.sun.management.jmxremote.port=8080
+ . -Dcom.sun.management.jmxremote.port=8080
  
  この場合、リモート接続用のJMX URLは以下のようになります。
  
  service:jmx:rmi:///jndi/rmi://localhost:8080/jmxrmi
  
- これは以下のツールで内部的に使用されています。
- 
- bin/nodetool src/java/org/apache/cassandra/tools/nodetool.java
+ これは次のツールで内部的に使用されています: bin/nodetool 
src/java/org/apache/cassandra/tools/nodetool.java
  
  {{attachment:jconsole_connect.png}}
  
  
接続して''MBeans''タブを選択し、''org.apache.cassandra.db''セクションを展開すると、定義した!ColumnFamilyが見えるはずです。
  
- そこには3つの注目スべき属性があります。
+ ここには3つの注目すべき属性があります。
  
   1. ''!MemtableColumnsCount'' 
このテーブルのカラムの総数を表します。100個のカラムを持つ100行のデータを保持すると、この値は10,000になります。この属性は[[#MemtableObjectCountInMillions|MemtableObjectCountInMillions]]の閾値を設定するのに役立ちます。
-  1. ''!MemtableDataSize'' 
保存されているデータのトータルサイズを決定します。これはすべての保存されている値の和で、Memtableのオーバーヘッドは計上していません。(つまり、Memtableが実際にどれだけメモリを使用しているかは示しません。)
 [[#MemtableSizeInMB|MemtableSizeInMB]]の値を調整するときに利用してください。
-  1. ''!MemtableSwitchCount'' !ColumnFamilyがMemtableをディスクにフラッシュした際にカウントアップされます。
  
+  1. ''!MemtableDataSize'' 
保存されているデータのトータルサイズを決定します。これはすべての保存されている値の和で、Memtableのオーバーヘッドは計上していません。(つまり、Memtableが実際にどれだけメモリを使用しているかは示しません。)
 [[#MemtableThroughputInMB|MemtableThroughputInMB]]の値を調整するときに利用してください。
+ 
+  1. ''!MemtableSwitchCount'' !ColumnFamilyがMemtableをディスクにフラッシュした際にインクリメントされます。
+ 
- ''ノート: 値を反映させるためには`Refresh`ボタンを押す必要があります。''
+ ''注意: 表示している値を更新するには`Refresh`ボタンを押す必要があります。''
  
  {{attachment:jconsole_attributes.png}}

[Cassandra Wiki] Update of "MemtableThresholds_JP" by MakiWatanabe

Reply via email to