Github user eminency commented on a diff in the pull request:

    https://github.com/apache/tajo/pull/844#discussion_r44100456
  
    --- Diff: tajo-docs/src/main/sphinx/configuration/tajo-site-xml.rst ---
    @@ -2,23 +2,455 @@
     The tajo-site.xml File
     **********************
     
    -To the ``core-site.xml`` file on every host in your cluster, you must add 
the following information:
    +You can add more configurations in the ``tajo-site.xml`` file. Note that 
you should replicate this file to the whole hosts in your cluster once you 
edited.
    +If you are looking for the configurations for the master and the worker, 
please refer to :doc:`tajo_master_configuration` and 
:doc:`worker_configuration`.
    +Also, catalog configurations are found here :doc:`catalog_configuration`.
    +
    +=========================
    +Join Query Settings
    +=========================
    +
    +""""""""""""""""""""""""""""""""""""""
    +`tajo.dist-query.join.auto-broadcast`
    +""""""""""""""""""""""""""""""""""""""
    +
    +A flag to enable or disable the use of broadcast join.
    +
    +  * Property value: Boolean
    +  * Default value: true
    +  * Example
    +
    +.. code-block:: xml
    +
    +  <property>
    +    <name>tajo.dist-query.join.auto-broadcast</name>
    +    <value>true</value>
    +  </property>
    +
    +"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
    +`tajo.dist-query.broadcast.non-cross-join.threshold-kb`
    +"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
    +
    +A threshold for non-cross joins. When a non-cross join query is executed 
with the broadcast join, the whole size of broadcasted tables won't exceed this 
threshold.
    +
    +  * Property value: Integer
    +  * Unit: KB
    +  * Default value: 5120
    +  * Example
    +
    +.. code-block:: xml
    +
    +  <property>
    +    <name>tajo.dist-query.broadcast.non-cross-join.threshold-kb</name>
    +    <value>5120</value>
    +  </property>
    +
    +"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
    +`tajo.dist-query.broadcast.cross-join.threshold-kb`
    +"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
    +
    +A threshold for cross joins. When a cross join query is executed, the 
whole size of broadcasted tables won't exceed this threshold.
    +
    +  * Property value: Integer
    +  * Unit: KB
    +  * Default value: 1024
    +  * Example
    +
    +.. code-block:: xml
    +
    +  <property>
    +    <name>tajo.dist-query.broadcast.cross-join.threshold-kb</name>
    +    <value>1024</value>
    +  </property>
    +
    +.. warning::
    +  In Tajo, the broadcast join is only the way to perform cross joins. 
Since the cross join is a very expensive operation, this value need to be tuned 
carefully.
    +
    +""""""""""""""""""""""""""""""""""""""
    +`tajo.dist-query.join.task-volume-mb`
    +""""""""""""""""""""""""""""""""""""""
    +
    +The repartition join is executed in two stages. When a join query is 
executed with the repartition join, this value indicates the amount of input 
data processed by each task at the second stage.
    +As a result, it determines the degree of the parallel processing of the 
join query.
    +
    +  * Property value: Integer
    +  * Unit: MB
    +  * Default value: 64
    +  * Example
    +
    +.. code-block:: xml
    +
    +  <property>
    +    <name>tajo.dist-query.join.task-volume-mb</name>
    +    <value>64</value>
    +  </property>
    +
    +"""""""""""""""""""""""""""""""""""""""""""
    +`tajo.dist-query.join.partition-volume-mb`
    +"""""""""""""""""""""""""""""""""""""""""""
    +
    +The repartition join is executed in two stages. When a join query is 
executed with the repartition join,
    +this value indicates the output size of each task at the first stage, 
which determines the number of partitions to be shuffled between two stages.
    +
    +  * Property value: Integer
    +  * Unit: MB
    +  * Default value: 128
    +  * Example
    +
    +.. code-block:: xml
    +
    +  <property>
    +    <name>tajo.dist-query.join.partition-volume-mb</name>
    +    <value>128</value>
    +  </property>
    +
    +""""""""""""""""""""""""""""""""""""""""""""""""""""""""
    +`tajo.executor.join.common.in-memory-hash-threshold-mb`
    +""""""""""""""""""""""""""""""""""""""""""""""""""""""""
    +
    +This value provides the criterion to decide the algorithm to perform a 
join in a task.
    +If the input data is smaller than this value, join is performed with the 
in-memory hash join.
    +Otherwise, the sort-merge join is used.
    +
    +  * Property value: Integer
    +  * Unit: MB
    +  * Default value: 64
    +  * Example
    +
    +.. code-block:: xml
    +
    +  <property>
    +    <name>tajo.executor.join.common.in-memory-hash-threshold-mb</name>
    +    <value>64</value>
    +  </property>
    +
    +.. warning::
    +  This value is the size of the input stored on file systems. So, when the 
input data is loaded into JVM heap,
    +  its actual size is usually much larger than the configured value, which 
means that too large threshold can cause unexpected OutOfMemory errors.
    +  This value should be tuned carefully.
    +
    +""""""""""""""""""""""""""""""""""""""""""""""""""""""""
    +`tajo.executor.join.inner.in-memory-hash-threshold-mb`
    +""""""""""""""""""""""""""""""""""""""""""""""""""""""""
    +
    +This value provides the criterion to decide the algorithm to perform an 
inner join in a task.
    +If the input data is smaller than this value, the inner join is performed 
with the in-memory hash join.
    +Otherwise, the sort-merge join is used.
    +
    +  * Property value: Integer
    +  * Unit: MB
    +  * Default value: 64
    +  * Example
    +
    +.. code-block:: xml
    +
    +  <property>
    +    <name>tajo.executor.join.inner.in-memory-hash-threshold-mb</name>
    +    <value>64</value>
    +  </property>
    +
    +.. warning::
    +  This value is the size of the input stored on file systems. So, when the 
input data is loaded into JVM heap,
    +  its actual size is usually much larger than the configured value, which 
means that too large threshold can cause unexpected OutOfMemory errors.
    +  This value should be tuned carefully.
    +
    +""""""""""""""""""""""""""""""""""""""""""""""""""""""""
    +`tajo.executor.join.outer.in-memory-hash-threshold-mb`
    +""""""""""""""""""""""""""""""""""""""""""""""""""""""""
    +
    +This value provides the criterion to decide the algorithm to perform an 
outer join in a task.
    +If the input data is smaller than this value, the outer join is performed 
with the in-memory hash join.
    +Otherwise, the sort-merge join is used.
    +
    +  * Property value: Integer
    +  * Unit: MB
    +  * Default value: 64
    +  * Example
    +
    +.. code-block:: xml
    +
    +  <property>
    +    <name>tajo.executor.join.outer.in-memory-hash-threshold-mb</name>
    +    <value>64</value>
    +  </property>
    +
    +.. warning::
    +  This value is the size of the input stored on file systems. So, when the 
input data is loaded into JVM heap,
    +  its actual size is usually much larger than the configured value, which 
means that too large threshold can cause unexpected OutOfMemory errors.
    +  This value should be tuned carefully.
    +
    +"""""""""""""""""""""""""""""""""""""
    +`tajo.executor.join.hash-table.size`
    +"""""""""""""""""""""""""""""""""""""
    +
    +The initial size of hash table for in-memory hash join.
    +
    +  * Property value: Integer
    +  * Default value: 100000
    +  * Example
    +
    +.. code-block:: xml
    +
    +  <property>
    +    <name>tajo.executor.join.hash-table.size</name>
    +    <value>100000</value>
    +  </property>
     
     ======================
    -System Config
    +Sort Query Settings
     ======================
     
    +""""""""""""""""""""""""""""""""""""""
    +`tajo.dist-query.sort.task-volume-mb`
    +""""""""""""""""""""""""""""""""""""""
    +
    +The sort operation is executed in two stages. When a sort query is 
executed, this value indicates the amount of input data processed by each task 
at the second stage.
    +As a result, it determines the degree of the parallel processing of the 
sort query.
    +
    +  * Property value: Integer
    +  * Unit: MB
    +  * Default value: 64
    +  * Example
    +
    +.. code-block:: xml
    +
    +  <property>
    +    <name>tajo.dist-query.sort.task-volume-mb</name>
    +    <value>64</value>
    +  </property>
    +
    +""""""""""""""""""""""""""""""""""""""""
    +`tajo.executor.external-sort.buffer-mb`
    +""""""""""""""""""""""""""""""""""""""""
    +
    +A threshold to choose the sort algorithm. If the input data is larger than 
this threshold, the external sort algorithm is used.
    +
    +  * Property value: Integer
    +  * Unit: MB
    +  * Default value: 200
    +  * Example
    +
    +.. code-block:: xml
    +
    +  <property>
    +    <name>tajo.executor.external-sort.buffer-mb</name>
    +    <value>200</value>
    +  </property>
    +
    +""""""""""""""""""""""""""""""""""""""
    +`tajo.executor.sort.list.size`
    +""""""""""""""""""""""""""""""""""""""
     
    +The initial size of list for in-memory sort.
    +
    +  * Property value: Integer
    +  * Default value: 100000
    +  * Example
    +
    +.. code-block:: xml
    +
    +  <property>
    +    <name>tajo.executor.sort.list.size</name>
    +    <value>100000</value>
    +  </property>
    +
    +=========================
    +Group by Query Settings
    +=========================
    +
    +""""""""""""""""""""""""""""""""""""""""""""
    +`tajo.dist-query.groupby.multi-level-aggr`
    +""""""""""""""""""""""""""""""""""""""""""""
    +
    +A flag to enable the multi-level algorithm for distinct aggregation. If 
this value is set, 3-phase aggregation algorithm is used.
    +Otherwise, 2-phase aggregation algorithm is used.
    +
    +  * Property value: Boolean
    +  * Default value: true
    +  * Example
    +
    +.. code-block:: xml
    +
    +  <property>
    +    <name>tajo.dist-query.groupby.multi-level-aggr</name>
    +    <value>true</value>
    +  </property>
    +
    +""""""""""""""""""""""""""""""""""""""""""""""
    +`tajo.dist-query.groupby.partition-volume-mb`
    +""""""""""""""""""""""""""""""""""""""""""""""
    +
    +The aggregation is executed in two stages. When an aggregation query is 
executed,
    +this value indicates the output size of each task at the first stage, 
which determines the number of partitions to be shuffled between two stages.
    +
    +  * Property value: Integer
    +  * Unit: MB
    +  * Default value: 256
    +  * Example
    +
    +.. code-block:: xml
    +
    +  <property>
    +    <name>tajo.dist-query.groupby.partition-volume-mb</name>
    +    <value>256</value>
    +  </property>
    +
    +""""""""""""""""""""""""""""""""""""""""""""""
    +`tajo.dist-query.groupby.task-volume-mb`
    +""""""""""""""""""""""""""""""""""""""""""""""
    +
    +The aggregation operation is executed in two stages. When an aggregation 
query is executed, this value indicates the amount of input data processed by 
each task at the second stage.
    +As a result, it determines the degree of the parallel processing of the 
aggregation query.
    +
    +  * Property value: Integer
    +  * Unit: MB
    +  * Default value: 64
    +  * Example
    +
    +.. code-block:: xml
    +
    +  <property>
    +    <name>tajo.dist-query.groupby.partition-volume-mb</name>
    +    <value>64</value>
    +  </property>
    +
    +""""""""""""""""""""""""""""""""""""""""""""""""""""""""
    +`tajo.executor.groupby.in-memory-hash-threshold-mb`
    +""""""""""""""""""""""""""""""""""""""""""""""""""""""""
    +
    +This value provides the criterion to decide the algorithm to perform an 
aggregation in a task.
    +If the input data is smaller than this value, the aggregation is performed 
with the in-memory hash aggregation.
    +Otherwise, the sort-based aggregation is used.
    +
    +  * Property value: Integer
    +  * Unit: MB
    +  * Default value: 64
    +  * Example
    +
    +.. code-block:: xml
    +
    +  <property>
    +    <name>tajo.executor.groupby.in-memory-hash-threshold-mb</name>
    +    <value>64</value>
    +  </property>
    +
    +.. warning::
    +  This value is the size of the input stored on file systems. So, when the 
input data is loaded into JVM heap,
    +  its actual size is usually much larger than the configured value, which 
means that too large threshold can cause unexpected OutOfMemory errors.
    +  This value should be tuned carefully.
    +
    +""""""""""""""""""""""""""""""""""""""""""
    +`tajo.executor.aggregate.hash-table.size`
    +""""""""""""""""""""""""""""""""""""""""""
    +
    +The initial size of list for in-memory sort.
    --- End diff --
    
    Description explains for list size, but property name looks that it means 
hash table size.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

Reply via email to