More precisely:

We have hive server (which is also host with hdfs), hive on this machine is configured to use mysql database on other host. Server is launched by "hive --service metastore" command.
While running server I get worrying logs (which maybe causes this problem):
10/05/25 16:38:26 WARN DataNucleus.MetaData: MetaData Parser encountered an error in file "jar:file:/usr/lib/hive/lib/hive-metastore-0.5.0.jar!/package.jdo" at line 4, column 6 : cvc-elt.1: Cannot find the declaration of element 'jdo'. - Please check your specification of DTD and the validity of the MetaData XML that you have specified. 10/05/25 16:38:26 WARN DataNucleus.MetaData: MetaData Parser encountered an error in file "jar:file:/usr/lib/hive/lib/hive-metastore-0.5.0.jar!/package.jdo" at line 282, column 13 : The content of element type "class" must match "(extension*,implements*,datastore-identity?,primary-key?,inheritance?,version?,join*,foreign-key*,index*,unique*,column*,field*,property*,query*,fetch-group*,extension*)". - Please check your specification of DTD and the validity of the MetaData XML that you have specified.

We have also hive client, which is configured to use hive server mentioned above.

When launching hive on hive client:

query:
LOAD DATA LOCAL INPATH 'file' INTO TABLE wh_im_status PARTITION
(day='2010-10-02');

response:
Copying data from file: file
Loading data to table wh_im_status partition {day=2010-10-11}
Failed with exception org.apache.thrift.TApplicationException:
get_partition failed: unknown result
FAILED: Execution Error, return code 1 from
org.apache.hadoop.hive.ql.exec.MoveTask

server logs while executing above query:
10/05/24 19:39:14 INFO metastore.HiveMetaStore: 1: get_table :
db=default tbl=wh_im_status
10/05/24 19:39:14 INFO metastore.HiveMetaStore: 1: get_partition :
db=default tbl=wh_im_status

This problem appears also when using python hive library from client host.


When I launch hive on hive server host the same query as quoted above works fine. When I create table wh_im_status2 without partitions, loading data works fine, so this problem is stricly connected with pushing data to partitions through thrift.


Our main goal is to load partitioned data from remote hosts into hadoop hive. Maybe you are reaching that goal in another way?




Full configuration:

Server:

<configuration>
<property>
  <name>hive.exec.scratchdir</name>
  <value>/tmp/hive-${user.name}</value>
  <description>Scratch space for Hive jobs</description>
</property>

<property>
  <name>javax.jdo.option.ConnectionURL</name>

<value>jdbc:mysql://hive-test-db-1.test/hive_metastore?createDatabaseIfNotExist=true</value>
  <description>JDBC connect string for a JDBC metastore</description>
</property>

<property>
  <name>javax.jdo.option.ConnectionDriverName</name>
  <value>com.mysql.jdbc.Driver</value>
  <description>Driver class name for a JDBC metastore</description>
</property>

<property>
  <name>javax.jdo.option.ConnectionUserName</name>
  <value>hive_test</value>
</property>

<property>
  <name>javax.jdo.option.ConnectionPassword</name>
  <value>hive_test</value>
</property>

<property>
  <name>hive.metastore.metadb.dir</name>
  <value>file:///var/metastore/metadb/</value>
  <description>The location of filestore metadata base dir</description>
</property>

<property>
  <name>hive.metastore.uris</name>
  <value>file:///var/lib/hivevar/metastore/metadb/</value>
<description>Comma separated list of URIs of metastore servers. The first server that can be connected to will be used.</description>
</property>

<property>
  <name>hive.metastore.warehouse.dir</name>
  <value>/var/warehouse</value>
  <description>location of default database for the warehouse</description>
</property>

<property>
  <name>hive.metastore.connect.retries</name>
  <value>5</value>
<description>Number of retries while opening a connection to metastore</description>
</property>

<property>
  <name>hive.metastore.rawstore.impl</name>
  <value>org.apache.hadoop.hive.metastore.ObjectStore</value>
<description>Name of the class that implements org.apache.hadoop.hive.metastore.rawstore interface. This class is used to store and retrieval of raw metadata objects such as table, database</description>
</property>

<property>
  <name>hive.default.fileformat</name>
  <value>TextFile</value>
<description>Default file format for CREATE TABLE statement. Options are TextFile and SequenceFile. Users can explicitly say CREATE TABLE ... STORED AS &lt;TEXTFILE|SEQUENCEFILE&gt; to override</description>
</property>

<property>
  <name>hive.map.aggr</name>
  <value>false</value>
<description>Whether to use map-side aggregation in Hive Group By queries</description>
</property>

<property>
  <name>hive.join.emit.interval</name>
  <value>1000</value>
<description>How many rows in the right-most join operand Hive should buffer before emitting the join result. </description>
</property>

<property>
  <name>hive.exec.script.maxerrsize</name>
  <value>100000</value>
<description>Maximum number of bytes a script is allowed to emit to standard error (per map-reduce task). This prevents runaway scripts from filling logs partitions to capacity </description>
</property>

<property>
  <name>hive.exec.compress.output</name>
  <value>false</value>
<description> This controls whether the final outputs of a query (to a local/hdfs file or a hive table) is compressed. The compression codec and other options are determined from hadoop config variables mapred.output.compress* </description>
</property>

<property>
  <name>hive.exec.compress.intermediate</name>
  <value>false</value>
<description> This controls whether intermediate files produced by hive between multiple map-reduce jobs are compressed. The compression codec and other options are determined from hadoop config variables mapred.output.compress* </description>
</property>

<property>
  <name>hive.hwi.listen.host</name>
  <value>0.0.0.0</value>
<description>This is the host address the Hive Web Interface will listen on</description>
</property>

<property>
  <name>hive.hwi.listen.port</name>
  <value>9999</value>
<description>This is the port the Hive Web Interface will listen on</description>
</property>

<property>
  <name>hive.hwi.war.file</name>
  <value>/usr/lib/hive/lib/hive-hwi-0.5.0.war</value>
<description>This is the WAR file with the jsp content for Hive Web Interface</description>
</property>

</configuration>



Client:

<configuration>
<property>
  <name>hive.exec.scratchdir</name>
  <value>/tmp/hive-${user.name}</value>
  <description>Scratch space for Hive jobs</description>
</property>

<property>
  <name>hive.metastore.local</name>
  <value>false</value>
<description>controls whether to connect to remove metastore server or open a new metastore server in Hive Client JVM</description>
</property>

<property>
  <name>hive.metastore.metadb.dir</name>
  <value>file:///var/metastore/metadb/</value>
  <description>The location of filestore metadata base dir</description>
</property>

<property>
  <name>hive.metastore.uris</name>
  <value>thrift://test-storage-1.atm:9083</value>
<description>Comma separated list of URIs of metastore servers. The first server that can be connected to will be used.</description>
</property>

<property>
  <name>hive.metastore.warehouse.dir</name>
  <value>hdfs://test-storage-1.atm:54310/var/warehouse</value>
  <description>location of default database for the warehouse</description>
</property>

<property>
  <name>hive.metastore.connect.retries</name>
  <value>5</value>
<description>Number of retries while opening a connection to metastore</description>
</property>

<property>
  <name>hive.metastore.rawstore.impl</name>
  <value>org.apache.hadoop.hive.metastore.ObjectStore</value>
<description>Name of the class that implements org.apache.hadoop.hive.metastore.rawstore interface. This class is used to store and retrieval of raw metadata objects such as table, database</description>
</property>

<property>
  <name>hive.default.fileformat</name>
  <value>TextFile</value>
<description>Default file format for CREATE TABLE statement. Options are TextFile and SequenceFile. Users can explicitly say CREATE TABLE ... STORED AS &lt;TEXTFILE|SEQUENCEFILE&gt; to override</description>
</property>

<property>
  <name>hive.map.aggr</name>
  <value>false</value>
<description>Whether to use map-side aggregation in Hive Group By queries</description>
</property>

<property>
  <name>hive.join.emit.interval</name>
  <value>1000</value>
<description>How many rows in the right-most join operand Hive should buffer before emitting the join result. </description>
</property>

<property>
  <name>hive.exec.script.maxerrsize</name>
  <value>100000</value>
<description>Maximum number of bytes a script is allowed to emit to standard error (per map-reduce task). This prevents runaway scripts from filling logs partitions to capacity </description>
</property>

<property>
  <name>hive.exec.compress.output</name>
  <value>false</value>
<description> This controls whether the final outputs of a query (to a local/hdfs file or a hive table) is compressed. The compression codec and other options are determined from hadoop config variables mapred.output.compress* </description>
</property>

<property>
  <name>hive.exec.compress.intermediate</name>
  <value>false</value>
<description> This controls whether intermediate files produced by hive between multiple map-reduce jobs are compressed. The compression codec and other options are determined from hadoop config variables mapred.output.compress* </description>
</property>

<property>
  <name>hive.hwi.listen.host</name>
  <value>0.0.0.0</value>
<description>This is the host address the Hive Web Interface will listen on</description>
</property>

<property>
  <name>hive.hwi.listen.port</name>
  <value>9999</value>
<description>This is the port the Hive Web Interface will listen on</description>
</property>

<property>
  <name>hive.hwi.war.file</name>
  <value>/usr/lib/hive/lib/hive_hwi.war</value>
<description>This is the WAR file with the jsp content for Hive Web Interface</description>
</property>

</configuration>

--
Aleksander Siewierski

Reply via email to