(hive-site) branch main updated: Fix some "Raw HTML omitted" warnings and formatting issues (part 2) (#99)

zabetak Tue, 09 Jun 2026 00:33:05 -0700

This is an automated email from the ASF dual-hosted git repository.

zabetak pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/hive-site.git



The following commit(s) were added to refs/heads/main by this push:
     new 971efd7d Fix some "Raw HTML omitted" warnings and formatting issues 
(part 2) (#99)
971efd7d is described below

commit 971efd7db933161a7b05aa5624ddf2a0d94dc579
Author: Thomas Rebele <[email protected]>
AuthorDate: Tue Jun 9 09:32:48 2026 +0200

    Fix some "Raw HTML omitted" warnings and formatting issues (part 2) (#99)
---
 .../hive-across-multiple-data-centers.md           | 14 +++++-----
 .../desingdocs/hive-metadata-caching-proposal.md   | 31 +++++++---------------
 .../desingdocs/hivereplicationv2development.md     | 26 +++++++++---------
 content/Development/desingdocs/indexdev.md         |  2 +-
 .../Development/desingdocs/subqueries-in-select.md |  2 +-
 .../support-saml-2-0-authentication-mode.md        | 12 +++++++++
 .../desingdocs/type-qualifiers-in-hive.md          |  6 ++---
 content/Development/gettingstarted-latest.md       |  2 +-
 .../docs/latest/admin/adminmanual-configuration.md | 14 +++++-----
 .../adminmanual-metastore-3-0-administration.md    |  2 +-
 .../admin/adminmanual-metastore-administration.md  | 16 +++++------
 .../latest/admin/hive-on-spark-getting-started.md  |  6 ++---
 .../docs/latest/admin/setting-up-hiveserver2.md    |  4 +--
 13 files changed, 67 insertions(+), 70 deletions(-)

diff --git 
a/content/Development/desingdocs/hive-across-multiple-data-centers.md 
b/content/Development/desingdocs/hive-across-multiple-data-centers.md
index 47ba7f8f..9354660e 100644
--- a/content/Development/desingdocs/hive-across-multiple-data-centers.md
+++ b/content/Development/desingdocs/hive-across-multiple-data-centers.md
@@ -84,10 +84,10 @@ been imposed to simplify the problem:
 The same idea can be extended for partitioned tables.
 
 * The user can also decide to run in a particular cluster.
-       + Use cluster <ClusterName>
+       + Use cluster `<ClusterName>`
 * The system will not make an attempt to choose the cluster for the user, but 
only try to figure out if the query can be run  
 
- in the <clusterName>. If the query can run in this cluster, it will succeed. 
Otherwise, it will fail.
+ in the `<clusterName>`. If the query can run in this cluster, it will 
succeed. Otherwise, it will fail.
 * The user can go back to the behavior to use the default cluster.
        + Use cluster
 
@@ -101,7 +101,7 @@ The same idea can be extended for partitioned tables.
 
  PrimaryCluster - ClusterStorageDescriptor  
 
- and SecondaryClusters - Set<ClusterStorageDescriptor>
+ and SecondaryClusters - Set&lt;ClusterStorageDescriptor>&gt;
 
  The ClusterStorageDescriptor contains the following:  
 
@@ -128,12 +128,12 @@ The existing thrift API's will continue to work as if the 
user is trying to acce
 
 New APIs will be added which take the cluster as a new parameter. Almost all 
the existing APIs will be   
 
-enhanced to support this. The behavior will be the same as if, the user issued 
the command 'USE CLUSTER <CLUSTERNAME>
+enhanced to support this. The behavior will be the same as if, the user issued 
the command `USE CLUSTER <CLUSTERNAME>`
 
 * A new parameter will be added to keep the filesystem and jobtrackers for a 
cluster
-       + hive.cluster.properties: This will be json - ClusterName -> 
<FileSystem, JobTracker>
-       + use cluster <cluster name> will fail if <cluster name> is not present 
hive.cluster.properties
-       + The other option was to support create cluster <> etc. but that would 
have required storing the cluster information in the  
+       + hive.cluster.properties: This will be json - ClusterName -&gt; 
&lt;FileSystem, JobTracker&gt;
+       + use cluster `<cluster name>` will fail if `<cluster name>` is not 
present hive.cluster.properties
+       + The other option was to support create cluster `<>` etc. but that 
would have required storing the cluster information in the  
        
         metastore including jobtracker etc. which would be difficult to change 
per session.
 
diff --git a/content/Development/desingdocs/hive-metadata-caching-proposal.md 
b/content/Development/desingdocs/hive-metadata-caching-proposal.md
index 712757e3..3deaf115 100644
--- a/content/Development/desingdocs/hive-metadata-caching-proposal.md
+++ b/content/Development/desingdocs/hive-metadata-caching-proposal.md
@@ -63,57 +63,44 @@ Presto has the following cache:
 + userTablePrivileges
 
 * Range scan cache
-+ databaseNamesCache: regex -> database names, facilitates database search
++ databaseNamesCache: regex -&gt; database names, facilitates database search
 + tableNamesCache
 + viewNamesCache
-+ partitionNamesCache: table name -> partition names
++ partitionNamesCache: table name -&gt; partition names
 
 * Other
-+ partitionFilterCache: PS -> partition names, facilitates partition pruning
++ partitionFilterCache: PS -&gt; partition names, facilitates partition pruning
 
 For every partition filter condition, Presto breaks it down into tupleDomain 
and remainder:
 
+```
 AddExchanges.planTableScan:
-
             DomainTranslator.ExtractionResult decomposedPredicate = 
DomainTranslator.fromPredicate(
-
                     metadata,
-
                     session,
-
                     deterministicPredicate,
-
                     types);
-
     public static class ExtractionResult
-
     {
-
         private final TupleDomain<Symbol> tupleDomain;
-
         private final Expression remainingExpression;
-
     }
+```
 
-tupleDomain is a mapping of column -> range or exact value. When converting to 
PS, any range will be converted into wildcard and only exact value will be 
considered:
+tupleDomain is a mapping of column -&gt; range or exact value. When converting 
to PS, any range will be converted into wildcard and only exact value will be 
considered:
 
+```
 HivePartitionManager.getFilteredPartitionNames:
-
         for (HiveColumnHandle partitionKey : partitionKeys) {
-
             if (domain != null && domain.isNullableSingleValue()) {
-
                     filter.add(((Slice) value).toStringUtf8());
-
             else {
-
                 filter.add(PARTITION_VALUE_WILDCARD);
-
             }
-
         }
+```
 
-For example, the expression “state = CA and date between ‘201612’ and ‘201701’ 
will be broken down to PS (state = CA) and remainder date between ‘201612’ and 
‘201701’. Presto will retrieve the partitions with state = CA from the PS -> 
partition name cache and partition object cache, and evaluates “date between 
‘201612’ and ‘201701’ for every partitions returned. This is a good balance 
compare to caching partition names for every expression.
+For example, the expression “state = CA and date between ‘201612’ and ‘201701’ 
will be broken down to PS (state = CA) and remainder date between ‘201612’ and 
‘201701’. Presto will retrieve the partitions with state = CA from the PS -&gt; 
partition name cache and partition object cache, and evaluates “date between 
‘201612’ and ‘201701’ for every partitions returned. This is a good balance 
compare to caching partition names for every expression.
 
 ## Our Approach
 
diff --git a/content/Development/desingdocs/hivereplicationv2development.md 
b/content/Development/desingdocs/hivereplicationv2development.md
index 936198a9..9380e6ac 100644
--- a/content/Development/desingdocs/hivereplicationv2development.md
+++ b/content/Development/desingdocs/hivereplicationv2development.md
@@ -168,7 +168,7 @@ Event 100: ALTER TABLE tbl ADD PARTITION (p=1) SET LOCATION 
<location>;
 Event 110: ALTER TABLE tbl DROP PARTITION (p=1);  
 Event 120: ALTER TABLE tbl ADD PARTITION (p=1) SET LOCATION <location>;
 ```
-When loading the dump on the destination side (at a much later point), when 
the event 100 is replayed, the load task on the destination will try to pull 
the files from the <location> (the _files contains the path of <location>), 
which may contain new or different data. To replicate the exact state of the 
source at the time event 100 occurred at the source, we do the following:
+When loading the dump on the destination side (at a much later point), when 
the event 100 is replayed, the load task on the destination will try to pull 
the files from the `<location>` (the _files contains the path of `<location>`), 
which may contain new or different data. To replicate the exact state of the 
source at the time event 100 occurred at the source, we do the following:
 
 1. When Event 100 occurs at the source, in the notification event, we store 
the checksum of the file(s) in the newly added partition along with the file 
path(s).
 2. When Event 110 occurs at the source, we move the files of the dropped 
partition to $cmroot/database/tbl/p=1 instead of purging them.
@@ -212,7 +212,9 @@ The current implementation of replication is built upon 
existing commands EXPORT
 This is better described via various examples of each of the pieces of the 
command syntax, as follows:
 
   
-(a) REPL DUMP sales;       REPL DUMP sales.['.*?']Replicates out sales 
database for bootstrap, from <init-evid>=0 (bootstrap case) to 
<end-evid>=<CURR-EVID> with a batch size of 0, i.e. no batching.
+(a) REPL DUMP sales;       REPL DUMP sales.['.*?']
+
+Replicates out sales database for bootstrap, from `<init-evid>=0` (bootstrap 
case) to `<end-evid>=<CURR-EVID>` with a batch size of 0, i.e. no batching.`
 
 (b) REPL DUMP sales.['T3', '[a-z]+'];
 
@@ -228,15 +230,15 @@ This sets up db-level replication that excludes all the 
tables/views but include
 
 (e) REPL DUMP sales FROM 200 TO 1400;
 
-The presence of a FROM <init-evid> tag makes this dump not a bootstrap, but a 
dump which looks at the event log to produce a delta dump. FROM 200 TO 1400 is 
self-evident in that it will go through event ids 200 to 1400 looking for 
events from the relevant db.
+The presence of a FROM `<init-evid>` tag makes this dump not a bootstrap, but 
a dump which looks at the event log to produce a delta dump. FROM 200 TO 1400 
is self-evident in that it will go through event ids 200 to 1400 looking for 
events from the relevant db.
 
 (f) REPL DUMP sales FROM 200;
 
-Similar to above, but with an implicit assumed <end-evid> as being the current 
event id at the time the command is run.
+Similar to above, but with an implicit assumed `<end-evid>` as being the 
current event id at the time the command is run.
 
 (g) REPL DUMP sales FROM 200 to 1400 LIMIT 100;REPL DUMP sales FROM 200 LIMIT 
100;
 
-Similar to cases (d) & (e), with the addition of a batch size of 
<num-evids>=100. This causes us to stop processing if we reach 100 events, and 
return at that point. Note that this does not mean that we stop processing at 
event id = 300, since we began at 200 - it means that we will stop processing 
events when we have processed 100 events in the event stream (that has 
unrelated events) belonging to this replication-definition, i.e. of a relevant 
db or db.table, then we stop.
+Similar to cases (d) & (e), with the addition of a batch size of 
`<num-evids>=100`. This causes us to stop processing if we reach 100 events, 
and return at that point. Note that this does not mean that we stop processing 
at event id = 300, since we began at 200 - it means that we will stop 
processing events when we have processed 100 events in the event stream (that 
has unrelated events) belonging to this replication-definition, i.e. of a 
relevant db or db.table, then we stop.
 
 (h) REPL DUMP sales.['[a-z]+'] REPLACE sales FROM 200;
 
@@ -258,8 +260,8 @@ The REPL DUMP command has an optional WITH clause to set 
command-specific confi
 
 1. Error codes returned as return error codes (and over jdbc if with HS2)
 2. Returns 2 columns in the ResultSet:
-       1. <dir-name> - the directory to which it has dumped info.
-       2. <last-evid> - the last event-id associated with this dump, which 
might be the end-evid, or the curr-evid, as the case may be.
+       1. `<dir-name>` - the directory to which it has dumped info.
+       2. `<last-evid>` - the last event-id associated with this dump, which 
might be the end-evid, or the curr-evid, as the case may be.
 
 #### Note:
 
@@ -275,20 +277,18 @@ When bootstrap dump is in progress, it blocks rename 
table/partition operations
 
 Look up the HiveServer logs for below pair of log messages.
 
-> REPL DUMP:: Set property for Database: <db_name>, Property: 
<bootstrap.dump.state.xxxx>, Value: ACTIVE
-> 
-> REPL DUMP:: Reset property for Database: <db_name>, Property: 
<bootstrap.dump.state.xxxx>
-> 
+> REPL DUMP:: Set property for Database: `<db_name>`, Property: 
`<bootstrap.dump.state.xxxx>`, Value: ACTIVE
 > 
+> REPL DUMP:: Reset property for Database: `<db_name>`, Property: 
`<bootstrap.dump.state.xxxx>`
 
-If Reset property log is not found for the corresponding Set property log, 
then user need to manually reset the database property 
<bootstrap.dump.state.xxxx> with value as "IDLE" using ALTER DATABASE command.
+If Reset property log is not found for the corresponding Set property log, 
then user need to manually reset the database property 
`<bootstrap.dump.state.xxxx>` with value as "IDLE" using ALTER DATABASE command.
 
 ## REPL LOAD
 
 `REPL LOAD {<dbname>} FROM <dirname> {WITH ('key1'='value1', 
'key2'='value2')};`
 
   
-This causes a REPL DUMP present in <dirname> (which is to be a fully qualified 
HDFS URL) to be pulled and loaded. If <dbname> is specified, and the original 
dump was a database-level dump, this allows Hive to do db-rename-mapping on 
import. If dbname is not specified, the original dbname as recorded in the dump 
would be used.The REPL LOAD command has an optional WITH clause to set 
command-specific configurations to be used when trying to copy from the source 
cluster. These configurations [...]
+This causes a REPL DUMP present in `<dirname>` (which is to be a fully 
qualified HDFS URL) to be pulled and loaded. If `<dbname>` is specified, and 
the original dump was a database-level dump, this allows Hive to do 
db-rename-mapping on import. If dbname is not specified, the original dbname as 
recorded in the dump would be used.The REPL LOAD command has an optional WITH 
clause to set command-specific configurations to be used when trying to copy 
from the source cluster. These configurat [...]
 
 #### Return values:
 
diff --git a/content/Development/desingdocs/indexdev.md 
b/content/Development/desingdocs/indexdev.md
index a24afa97..5379e13c 100644
--- a/content/Development/desingdocs/indexdev.md
+++ b/content/Development/desingdocs/indexdev.md
@@ -281,7 +281,7 @@ TBD: we will be adding methods for calling the handler when 
an index is dropped
 
 The reference implementation creates what is referred to as a "compact" index. 
This means that rather than storing the HDFS location of each occurrence of a 
particular value, it only stores the addresses of HDFS blocks containing that 
value. This is optimized for point-lookups in the case where a value typically 
occurs more than once in nearby rows; the index size is kept small since there 
are many fewer blocks than rows. The tradeoff is that extra work is required 
during queries in orde [...]
 
-The compact index is stored in an index table. The index table columns consist 
of the indexed columns from the base table followed by a _bucketname string 
column (indicating the name of the file containing the indexed block) followed 
by an _offsets array<string> column (indicating the block offsets within the 
corresponding file). The index table is stored as sorted on the indexed columns 
(but not on the generated columns).
+The compact index is stored in an index table. The index table columns consist 
of the indexed columns from the base table followed by a _bucketname string 
column (indicating the name of the file containing the indexed block) followed 
by an `_offsets array<string>` column (indicating the block offsets within the 
corresponding file). The index table is stored as sorted on the indexed columns 
(but not on the generated columns).
 
 The reference implementation can be plugged in with
 
diff --git a/content/Development/desingdocs/subqueries-in-select.md 
b/content/Development/desingdocs/subqueries-in-select.md
index 9c7d50ab..d1cee53e 100644
--- a/content/Development/desingdocs/subqueries-in-select.md
+++ b/content/Development/desingdocs/subqueries-in-select.md
@@ -79,7 +79,7 @@ SELECT customer.customer_num,
        ) AS total_ship_chg
 FROM customer 
 ```
-* Subqueries with DISTINCT are not allowed. Since DISTINCT <expression> will 
be evaluated as GROUP BY <expression>, subqueries with DISTINCT are disallowed 
for now.
+* Subqueries with DISTINCT are not allowed. Since `DISTINCT <expression>` will 
be evaluated as `GROUP BY <expression>`, subqueries with `DISTINCT` are 
disallowed for now.
 
 # Design
 
diff --git 
a/content/Development/desingdocs/support-saml-2-0-authentication-mode.md 
b/content/Development/desingdocs/support-saml-2-0-authentication-mode.md
index 5bd9c3ea..29b23f33 100644
--- a/content/Development/desingdocs/support-saml-2-0-authentication-mode.md
+++ b/content/Development/desingdocs/support-saml-2-0-authentication-mode.md
@@ -50,45 +50,57 @@ In order to make sure that the SAML assertions received by 
HiveServer2 are valid
 
 Following new configurations will be added to the hive-site.xml which would 
need to be configured by the clients.
 
+```
 <property>  
   <name>hive.server2.authentication</name>  
   <value>SAML</value>  
 </property>
+```
 
 This configuration will be set to SAML to indicate that the server will use 
SAML 2.0 protocol to authenticate the user. 
 
+```
 <property>  
   <name>hive.server2.saml2.idp.metadata</name>  
   <value>path_to_idp_metadata.xml</value>  
 </property>
+```
 
 This configuration will provide a path to the IDP metadata xml file.
 
+```
 <property>  
   <name>hive.server2.saml2.sp.entity.id</name>  
   <value>test_sp_entity_id</value>  
 </property>  
+```
   
 This configuration should be same the service provider entity id as configured 
in the IDP. Some identity providers require this to be same as the ACS URL.
 
+```
 <property>  
   <name>hive.server2.saml2.group.attribute.name</name>  
   <value>group_attribute_name</value>  
 </property>
+```
 
 This configuration will be used to map the SAML attribute in the response to 
the groups of the user. This should be configured in the identity provider as 
the attribute name for the group information.
 
+```
 <property>  
   <name>hive.server2.saml2.group.filter</name>  
   <value>comma_separated_group_names</value>  
 </property>
+```
 
 This configuration will be used to configure the allowed group names.
 
+```
 <property>  
   <name>hive.server2.saml2.sp.callback.url</name>  
   <value>callback_url_of_hiveserver2</value>  
 </property>
+```
 
 The http URL endpoint where the SAML assertion is posted back by the IDP. 
Currently this must be on the same port as HiveServer2’s http endpoint and must 
be TLS enabled (https) on secure setups.
 
diff --git a/content/Development/desingdocs/type-qualifiers-in-hive.md 
b/content/Development/desingdocs/type-qualifiers-in-hive.md
index eed01d3a..ddd0cdd3 100644
--- a/content/Development/desingdocs/type-qualifiers-in-hive.md
+++ b/content/Development/desingdocs/type-qualifiers-in-hive.md
@@ -39,16 +39,14 @@ The type qualifiers could simply be stored as part of the 
type string for a colu
 
 This approach would be similar to the attributes in the 
INFORMATION_SCHEMA.COLUMNS that some DBMS catalog tables have, such as those 
listed below:
 
-<pre>
-
+```
 |  CHARACTER_MAXIMUM_LENGTH  |  bigint(21) unsigned  |  YES  |   |  NULL  |   |
 |  CHARACTER_OCTET_LENGTH  |  bigint(21) unsigned  |  YES  |   |  NULL  |   |
 |  NUMERIC_PRECISION  |  bigint(21) unsigned  |  YES  |   |  NULL  |   |
 |  NUMERIC_SCALE  |  bigint(21) unsigned  |  YES  |   |  NULL  |   |
 |  CHARACTER_SET_NAME  |  varchar(32)  |  YES  |   |  NULL  |   |
 |  COLLATION_NAME  |  varchar(32)  |  YES  |   |  NULL  |   |
-
-</pre>
+```
 
 We could add new columns to the COLUMNS_V2 table for any type qualifiers we 
are trying to support (initially looks like CHARACTER_MAXIMUM_LENGTH, 
NUMERIC_PRECISION, NUMERIC_SCALE). Advantages to this would be that it is 
easier to query these parameters than the first approach, though types with no 
parameters would still have these columns (set to null). 
 
diff --git a/content/Development/gettingstarted-latest.md 
b/content/Development/gettingstarted-latest.md
index 72b2184a..e0712ef4 100644
--- a/content/Development/gettingstarted-latest.md
+++ b/content/Development/gettingstarted-latest.md
@@ -77,7 +77,7 @@ To build the current Hive code from the master branch:
 
 Here, {version} refers to the current Hive version.
 
-If building Hive source using Maven (mvn), we will refer to the directory 
"/packaging/target/apache-hive-{version}-SNAPSHOT-bin/apache-hive-{version}-SNAPSHOT-bin"
 as <install-dir> for the rest of the page.
+If building Hive source using Maven (mvn), we will refer to the directory 
"/packaging/target/apache-hive-{version}-SNAPSHOT-bin/apache-hive-{version}-SNAPSHOT-bin"
 as `<install-dir>` for the rest of the page.
 
 #### Compile Hive on branch-1
 
diff --git a/content/docs/latest/admin/adminmanual-configuration.md 
b/content/docs/latest/admin/adminmanual-configuration.md
index b7dd3a55..732a8816 100644
--- a/content/docs/latest/admin/adminmanual-configuration.md
+++ b/content/docs/latest/admin/adminmanual-configuration.md
@@ -43,7 +43,7 @@ The server-specific configuration file is useful in two 
situations:
        If HiveServer2 is using the metastore in embedded mode, 
hivemetastore-site.xml also is loaded.
        
        The order of precedence of the config files is as follows (later one 
has higher precedence) –  
-       hive-site.xml -> hivemetastore-site.xml -> hiveserver2-site.xml -> 
'`-hiveconf`' commandline parameters.
+       hive-site.xml -&gt; hivemetastore-site.xml -&gt; hiveserver2-site.xml 
-&gt; '`-hiveconf`' commandline parameters.
 
 ### hive-site.xml and hive-default.xml.template
 
@@ -61,8 +61,8 @@ The administrative configuration variables are listed 
[below]({{< ref "#below" >
 
 Hive uses temporary folders both on the machine running the Hive client and 
the default HDFS instance. These folders are used to store per-query 
temporary/intermediate data sets and are normally cleaned up by the hive client 
when the query is finished. However, in cases of abnormal hive client 
termination, some data may be left behind. The configuration details are as 
follows:
 
-* On the HDFS cluster this is set to */tmp/hive-<username>* by default and is 
controlled by the configuration variable *hive.exec.scratchdir*
-* On the client machine, this is hardcoded to */tmp/<username>*
+* On the HDFS cluster this is set to `*/tmp/hive-<username>*` by default and 
is controlled by the configuration variable *hive.exec.scratchdir*
+* On the client machine, this is hardcoded to `*/tmp/<username>*`
 
 Note that when writing data to a table/partition, Hive will first write to a 
temporary location on the target table's filesystem (using hive.exec.scratchdir 
as the temporary location) and then move the data to the target table. This 
applies in all cases - whether tables are stored in HDFS (normal case) or in 
file systems like S3 or even NFS.
 
@@ -98,9 +98,9 @@ Version information: Metrics
 | hive.ddl.output.format | The data format to use for DDL output (e.g. 
`DESCRIBE table`). One of "text" (for human readable text) or "json" (for a 
json object). (As of Hive 
[0.9.0](https://issues.apache.org/jira/browse/HIVE-2822).) | text |
 | hive.exec.script.wrapper | Wrapper around any invocations to script operator 
e.g. if this is set to python, the script passed to the script operator will be 
invoked as `python <script command>`. If the value is null or not set, the 
script is invoked as `<script command>`. | null |
 | hive.exec.plan |   | null |
-| hive.exec.scratchdir | This directory is used by Hive to store the plans for 
different map/reduce stages for the query as well as to stored the intermediate 
outputs of these stages.*Hive 0.14.0 and later:* HDFS root scratch directory 
for Hive jobs, which gets created with write all 
([733](https://issues.apache.org/jira/browse/HIVE-8143)) permission. For each 
connecting user, an HDFS scratch directory ${hive.exec.scratchdir}/<username> 
is created with ${hive.scratch.dir.permission}. | / [...]
+| hive.exec.scratchdir | This directory is used by Hive to store the plans for 
different map/reduce stages for the query as well as to stored the intermediate 
outputs of these stages.*Hive 0.14.0 and later:* HDFS root scratch directory 
for Hive jobs, which gets created with write all 
([733](https://issues.apache.org/jira/browse/HIVE-8143)) permission. For each 
connecting user, an HDFS scratch directory `${hive.exec.scratchdir}/<username>` 
is created with ${hive.scratch.dir.permission}. | [...]
 | hive.scratch.dir.permission | The permission for the user-specific scratch 
directories that get created in the root scratch directory 
${hive.exec.scratchdir}. (As of Hive 
[0.12.0](https://issues.apache.org/jira/browse/HIVE-4487).) | 700 (Hive 0.12.0 
and later) |
-| hive.exec.local.scratchdir | This directory is used for temporary files when 
Hive runs in local mode. (As of Hive 
[0.10.0](https://issues.apache.org/jira/browse/HIVE-1577).) | /tmp/<user.name> |
+| hive.exec.local.scratchdir | This directory is used for temporary files when 
Hive runs in local mode. (As of Hive 
[0.10.0](https://issues.apache.org/jira/browse/HIVE-1577).) | 
`/tmp/<user.name>` |
 | hive.exec.submitviachild | Determines whether the map/reduce jobs should be 
submitted through a separate jvm in the non local mode. | false - By default 
jobs are submitted through the same jvm as the compiler |
 | hive.exec.script.maxerrsize | Maximum number of serialization errors allowed 
in a user script invoked through `TRANSFORM` or `MAP` or `REDUCE` constructs. | 
100000 |
 | hive.exec.compress.output | Determines whether the output of the final 
map/reduce job in a query is compressed or not. | false |
@@ -119,7 +119,7 @@ Version information: Metrics
 | hive.merge.size.per.task | Size of merged files at the end of the job. | 
256000000 |
 | hive.merge.smallfiles.avgsize | When the average output file size of a job 
is less than this number, Hive will start an additional map-reduce job to merge 
the output files into bigger files. This is only done for map-only jobs if 
hive.merge.mapfiles is true, and for map-reduce jobs if hive.merge.mapredfiles 
is true. | 16000000 |
 | hive.querylog.enable.plan.progress | Whether to log the plan's progress 
every time a job's progress is checked. These logs are written to the location 
specified by `hive.querylog.location`. (As of Hive 
[0.10](https://issues.apache.org/jira/browse/HIVE-3230).) | true |
-| hive.querylog.location | Directory where structured hive query logs are 
created. One file per session is created in this directory. If this variable 
set to empty string structured log will not be created. | /tmp/<user.name> |
+| hive.querylog.location | Directory where structured hive query logs are 
created. One file per session is created in this directory. If this variable 
set to empty string structured log will not be created. | `/tmp/<user.name>` |
 | hive.querylog.plan.progress.interval | The interval to wait between logging 
the plan's progress in milliseconds. If there is a whole number percentage 
change in the progress of the mappers or the reducers, the progress is logged 
regardless of this value. The actual interval will be the ceiling of (this 
value divided by the value of `hive.exec.counters.pull.interval`) multiplied by 
the value of `hive.exec.counters.pull.interval` i.e. if it is not divide evenly 
by the value of `hive.exec [...]
 | hive.stats.autogather | A flag to gather statistics automatically during the 
INSERT OVERWRITE command. (As of Hive 
[0.7.0](https://issues.apache.org/jira/browse/HIVE-1361).) | true |
 | hive.stats.dbclass | The default database that stores temporary hive 
statistics. Valid values are `hbase` and `jdbc` while `jdbc` should have a 
specification of the Database to use, separated by a colon (e.g. `jdbc:mysql`). 
(As of Hive [0.7.0](https://issues.apache.org/jira/browse/HIVE-1361).) | 
jdbc:derby |
@@ -142,7 +142,7 @@ For security configuration (Hive 0.10 and later), see the 
[Hive Metastore Securi
 | --- | --- | --- |
 | hadoop.bin.path | The location of the Hadoop script which is used to submit 
jobs to Hadoop when submitting through a separate JVM. | 
$HADOOP_HOME/bin/hadoop |
 | hadoop.config.dir | The location of the configuration directory of the 
Hadoop installation. | $HADOOP_HOME/conf |
-| fs.default.name | The default name of the filesystem (for example, localhost 
for hdfs://<clustername>:8020).For YARN this configuration variable is called 
fs.defaultFS. | file:/// |
+| fs.default.name | The default name of the filesystem (for example, localhost 
for `hdfs://<clustername>:8020`).For YARN this configuration variable is called 
fs.defaultFS. | file:/// |
 | map.input.file | The filename the map is reading from. | null |
 | mapred.job.tracker | The URL to the jobtracker. If this is set to local then 
map/reduce is run in the local mode. | local |
 | mapred.reduce.tasks | The number of reducers for each map/reduce stage in 
the query plan. | 1 |
diff --git 
a/content/docs/latest/admin/adminmanual-metastore-3-0-administration.md 
b/content/docs/latest/admin/adminmanual-metastore-3-0-administration.md
index 3f103bc4..886b812b 100644
--- a/content/docs/latest/admin/adminmanual-metastore-3-0-administration.md
+++ b/content/docs/latest/admin/adminmanual-metastore-3-0-administration.md
@@ -103,7 +103,7 @@ To run the Metastore as a service, you must first configure 
it with a URL.
 
 | Configured On | Parameter | Hive 2 Parameter | Format | Default Value | 
Comment |
 | --- | --- | --- | --- | --- | --- |
-| Client | metastore.thrift.uris | hive.metastore.uris | 
thrift://<HOST>:<PORT>[, thrift://<HOST>:<PORT>...] | none | HOST = hostname, 
PORT = should be set to match metastore.thrift.port on the server (which 
defaults to 9083. You can provide multiple servers in a comma separate list. |
+| Client | metastore.thrift.uris | hive.metastore.uris | 
`thrift://<HOST>:<PORT>[, thrift://<HOST>:<PORT>...]` | none | HOST = hostname, 
PORT = should be set to match metastore.thrift.port on the server (which 
defaults to 9083. You can provide multiple servers in a comma separate list. |
 | Server | metastore.thrift.port | hive.metastore.port | integer | 9083 | Port 
Thrift will listen on. |
 
 Once you have configured your clients, you can start the Metastore on a server 
using the `start-metastore` utility.  See the `-help` option of that utility 
for available options.  There is no stop-metastore script.  You must locate the 
process id for the metastore and kill that process.
diff --git a/content/docs/latest/admin/adminmanual-metastore-administration.md 
b/content/docs/latest/admin/adminmanual-metastore-administration.md
index c965b66b..a42845df 100644
--- a/content/docs/latest/admin/adminmanual-metastore-administration.md
+++ b/content/docs/latest/admin/adminmanual-metastore-administration.md
@@ -141,7 +141,7 @@ The following example uses a[Remote Metastore Database]({{< 
ref "#remote-metasto
 | javax.jdo.option.ConnectionUserName | `<user name>` | user name for 
connecting to MySQL server |
 | javax.jdo.option.ConnectionPassword | `<password>` | password for connecting 
to MySQL server |
 | hive.metastore.warehouse.dir | `<base hdfs path>` | default location for 
Hive tables. |
-| hive.metastore.thrift.bind.host | <host_name> | Host name to bind the 
metastore service to. When empty, "localhost" is used. This configuration is 
available Hive 4.0.0 onwards. |
+| hive.metastore.thrift.bind.host | `<host_name>` | Host name to bind the 
metastore service to. When empty, "localhost" is used. This configuration is 
available Hive 4.0.0 onwards. |
 
 From Hive 3.0.0 
([HIVE-16452](https://issues.apache.org/jira/browse/HIVE-16452)) onwards the 
metastore database stores a GUID which can be queried using the Thrift API 
get_metastore_db_uuid by metastore clients in order to identify the backend 
database instance. This API can be accessed by the HiveMetaStoreClient using 
the method getMetastoreDbUuid().
 
@@ -162,13 +162,13 @@ From Hive 4.0.0 
([HIVE-20794](https://issues.apache.org/jira/browse/HIVE-20794))
 | Config Param | Config Value | Comment |
 | --- | --- | --- |
 | hive.metastore.service.discovery.mode | service discovery mode | When it is 
set to "zookeeper", ZooKeeper is used for dynamic service discovery of a remote 
metastore. In that case, a metastore adds itself to the ZooKeeper when it is 
started and removes itself when it shuts down. By default it is empty. Both the 
client and server should have same value for this parameter. |
-| hive.metastore.uris | <host_name>:<port>, <host_name>:<port>, ... | One or 
more host and port pairs of ZooKeeper servers forming a ZooKeeper ensemble. 
Used when hive.metastore.service.discovery.mode is set to "zookeeper". The 
configuration is not used by server otherwise. If all the servers are using the 
same port you may specify the port using hive.metastore.zookeeper.client.port 
instead of specifying it with every server separately. Both the client and 
server should have same value f [...]
-| hive.metastore.zookeeper.client.port | <port> | Port number when same port 
number is used by all the ZooKeeper servers in the ensemble. Both the client 
and server should have same value for this parameter. |
-| hive.metastore.zookeeper.namespace | <namespace name> | The parent node 
under which all ZooKeeper nodes for metastores are created. |
-| hive.metastore.zookeeper.session.timeout | <time in milliseconds> | 
ZooKeeper client's session timeout (in milliseconds). The client is 
disconnected if a heartbeat is not sent in the timeout. |
-| hive.metastore.zookeeper.connection.timeout | <time in seconds> | ZooKeeper 
client's connection timeout in seconds. Connection timeout * 
hive.metastore.zookeeper.connection.max.retries with exponential backoff is 
when curator client deems connection is lost to zookeeper. |
-| hive.metastore.zookeeper.connection.max.retries | <number> | Max number of 
times to retry when connecting to the ZooKeeper server. |
-| hive.metastore.zookeeper.connection.basesleeptime | <time in milliseconds> | 
Initial amount of time (in milliseconds) to wait between retries when 
connecting to the ZooKeeper server when using ExponentialBackoffRetry policy. |
+| hive.metastore.uris | `<host_name>:<port>, <host_name>:<port>, ...` | One or 
more host and port pairs of ZooKeeper servers forming a ZooKeeper ensemble. 
Used when hive.metastore.service.discovery.mode is set to "zookeeper". The 
configuration is not used by server otherwise. If all the servers are using the 
same port you may specify the port using hive.metastore.zookeeper.client.port 
instead of specifying it with every server separately. Both the client and 
server should have same value [...]
+| hive.metastore.zookeeper.client.port | `<port>` | Port number when same port 
number is used by all the ZooKeeper servers in the ensemble. Both the client 
and server should have same value for this parameter. |
+| hive.metastore.zookeeper.namespace | `<namespace name>` | The parent node 
under which all ZooKeeper nodes for metastores are created. |
+| hive.metastore.zookeeper.session.timeout | `<time in milliseconds>` | 
ZooKeeper client's session timeout (in milliseconds). The client is 
disconnected if a heartbeat is not sent in the timeout. |
+| hive.metastore.zookeeper.connection.timeout | `<time in seconds>` | 
ZooKeeper client's connection timeout in seconds. Connection timeout * 
hive.metastore.zookeeper.connection.max.retries with exponential backoff is 
when curator client deems connection is lost to zookeeper. |
+| hive.metastore.zookeeper.connection.max.retries | `<number>` | Max number of 
times to retry when connecting to the ZooKeeper server. |
+| hive.metastore.zookeeper.connection.basesleeptime | `<time in milliseconds>` 
| Initial amount of time (in milliseconds) to wait between retries when 
connecting to the ZooKeeper server when using ExponentialBackoffRetry policy. |
 
   
 
diff --git a/content/docs/latest/admin/hive-on-spark-getting-started.md 
b/content/docs/latest/admin/hive-on-spark-getting-started.md
index cee3ddb5..b0e4667b 100644
--- a/content/docs/latest/admin/hive-on-spark-getting-started.md
+++ b/content/docs/latest/admin/hive-on-spark-getting-started.md
@@ -41,7 +41,7 @@ For the installation perform the following tasks:
 1. Install Spark (either download pre-built Spark, or build assembly from 
source).
        * Install/build a compatible version.  Hive root `pom.xml`'s 
<spark.version> defines what version of Spark it was built/tested with.
        * Install/build a compatible distribution.  Each version of Spark has 
several distributions, corresponding with different versions of Hadoop.
-       * Once Spark is installed, find and keep note of the 
<spark-assembly-*.jar> location.
+       * Once Spark is installed, find and keep note of the 
`<spark-assembly-*.jar>` location.
        * Note that you must have a version of Spark which does **not** include 
the Hive jars. Meaning one which was not built with the Hive profile. If you 
will use Parquet tables, it's recommended to also enable the "parquet-provided" 
profile. Otherwise there could be conflicts in Parquet dependency. To remove 
Hive jars from the installation, simply use the following command under your 
Spark repository:
        
        Prior to Spark 2.0.0:
@@ -68,7 +68,7 @@ For the installation perform the following tasks:
        ./dev/make-distribution.sh --name "hadoop2-without-hive" --tgz 
"-Pyarn,hadoop-provided,hadoop-2.7,parquet-provided,orc-provided"
        ```
 2. Start Spark cluster
-       * Keep note of the <Spark Master URL>.  This can be found in Spark 
master WebUI.
+       * Keep note of the `<Spark Master URL>`.  This can be found in Spark 
master WebUI.
 
 ## Configuring YARN
 
@@ -175,7 +175,7 @@ On this 9 node cluster we’ll have two executors per host. 
As such we can confi
 | org.apache.spark.SparkException: Job aborted due to stage failure: Task 
5.0:0 had a not serializable result: java.io.NotSerializableException: 
org.apache.hadoop.io.BytesWritable | Spark serializer not set to Kryo. | Set 
spark.serializer to be org.apache.spark.serializer.KryoSerializer, see Step 3 
[above]({{< ref "#above" >}}). |
 | [ERROR] Terminal initialization failed; falling back to 
unsupportedjava.lang.IncompatibleClassChangeError: Found class jline.Terminal, 
but interface was expected | Hive has upgraded to Jline2 but jline 0.94 exists 
in the Hadoop lib. | 1. Delete jline from the Hadoop lib directory (it's only 
pulled in transitively from ZooKeeper). 2. export 
HADOOP_USER_CLASSPATH_FIRST=true 3. If this error occurs during mvn test, 
perform a mvn clean install on the root project and itests directory. |
 | Spark executor gets killed all the time and Spark keeps retrying the failed 
stage; you may find similar information in the YARN nodemanager log.WARN 
org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:
 Container [pid=217989,containerID=container_1421717252700_0716_01_50767235] is 
running beyond physical memory limits. Current usage: 43.1 GB of 43 GB physical 
memory used; 43.9 GB of 90.3 GB virtual memory used. Killing container. | For 
Spark on YARN, [...]
-| Run query and get an error like:FAILED: Execution Error, return code 3 from 
org.apache.hadoop.hive.ql.exec.spark.SparkTaskIn Hive logs, it 
shows:java.lang.NoClassDefFoundError: Could not initialize class 
org.xerial.snappy.Snappy  at 
org.xerial.snappy.SnappyOutputStream.<init>(SnappyOutputStream.java:79) | 
Happens on Mac (not officially supported).This is a general Snappy issue with 
Mac and is not unique to Hive on Spark, but workaround is noted here because it 
is needed for startup of  [...]
+| Run query and get an error like:FAILED: Execution Error, return code 3 from 
org.apache.hadoop.hive.ql.exec.spark.SparkTaskIn Hive logs, it 
shows:java.lang.NoClassDefFoundError: Could not initialize class 
org.xerial.snappy.Snappy  at 
org.xerial.snappy.SnappyOutputStream.&lt;init&gt;(SnappyOutputStream.java:79) | 
Happens on Mac (not officially supported).This is a general Snappy issue with 
Mac and is not unique to Hive on Spark, but workaround is noted here because it 
is needed for start [...]
 | Stack trace: ExitCodeException exitCode=1: .../launch_container.sh: line 27: 
$PWD:$PWD/__spark__.jar:$HADOOP_CONF_DIR.../usr/hdp/${hdp.version}/hadoop/lib/hadoop-lzo-0.6.0.${hdp.version}.jar:/etc/hadoop/conf/secure:$PWD/__app__.jar:$PWD/*:
 bad substitution  | The key mapreduce.application.classpath in 
/etc/hadoop/conf/mapred-site.xml contains a variable which is invalid in bash. 
| From **mapreduce.application.classpath** remove ` 
:/usr/hdp/${hdp.version}/hadoop/lib/hadoop-lzo-0.6.0.${h [...]
 | Exception in thread "Driver" scala.MatchError: 
java.lang.NoClassDefFoundError: org/apache/hadoop/mapreduce/TaskAttemptContext 
(of class java.lang.NoClassDefFoundError)  at 
org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:432)
 | MR is not on the YARN classpath. | If on HDP change from 
**/hdp/apps/${hdp.version}/mapreduce/mapreduce.tar.gz#mr-framework** to 
**/hdp/apps/2.2.0.0-2041/mapreduce/mapreduce.tar.gz#mr-framework** |
 | java.lang.OutOfMemoryError: PermGen space with spark.master=local | By 
default ([SPARK-1879](https://issues.apache.org/jira/browse/SPARK-1879)), 
Spark's own launch scripts increase PermGen to 128 MB, so we need to increase 
PermGen in hive launch script. | If use JDK7, append following in 
conf/hive-env.sh: ` export HADOOP_OPTS="$HADOOP_OPTS -XX:MaxPermSize=128m" ` If 
use JDK8, append following in Conf/hive-env.sh: ` export 
HADOOP_OPTS="$HADOOP_OPTS -XX:MaxMetaspaceSize=512m" ` |
diff --git a/content/docs/latest/admin/setting-up-hiveserver2.md 
b/content/docs/latest/admin/setting-up-hiveserver2.md
index eff94db0..aa05bc5a 100644
--- a/content/docs/latest/admin/setting-up-hiveserver2.md
+++ b/content/docs/latest/admin/setting-up-hiveserver2.md
@@ -166,7 +166,7 @@ Use the following steps to create and verify self-signed 
SSL certificates for us
 3. Export this certificate from keystore.jks to a certificate file: keytool 
-export -alias example.com -file example.com.crt -keystore keystore.jks
 4. Add this certificate to the client's truststore to establish trust: keytool 
-import -trustcacerts -alias example.com -file example.com.crt -keystore 
truststore.jks
 5. Verify that the certificate exists in truststore.jks: keytool -list 
-keystore truststore.jks
-6. Then start HiveServer2, and try to connect with beeline using: 
jdbc:hive2://<host>:<port>/<database>;ssl=true;sslTrustStore=<path-to-truststore>;trustStorePassword=<truststore-password>
+6. Then start HiveServer2, and try to connect with beeline using: 
`jdbc:hive2://<host>:<port>/<database>;ssl=true;sslTrustStore=<path-to-truststore>;trustStorePassword=<truststore-password>`
 
 ##### Selectively disabling SSL protocol versions
 
@@ -187,7 +187,7 @@ Warning
 Support is provided for PAM (Hive 0.13 onward, see 
[HIVE-6466](https://issues.apache.org/jira/browse/HIVE-6466)). To configure PAM:
 
 * Download the 
[JPAM](http://sourceforge.net/projects/jpam/files/jpam/jpam-1.1/) native 
library for the relevant architecture.
-* Unzip and copy libjpam.so to a directory (<libjmap-directory>) on the system.
+* Unzip and copy libjpam.so to a directory (`<libjmap-directory>`) on the 
system.
 * Add the directory to the LD_LIBRARY_PATH environment variable like 
so:`export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:<libjmap-directory>`
 * For some PAM modules, you'll have to ensure that your `/etc/shadow` and 
`/etc/login.defs` files are readable by the user running the HiveServer2 
process.

(hive-site) branch main updated: Fix some "Raw HTML omitted" warnings and formatting issues (part 2) (#99)

Reply via email to