Re: Problem with load to hive from remote machine

Aleksander Siewierski / Gadu-Gadu Tue, 25 May 2010 09:10:02 -0700

More precisely:

We have hive server (which is also host with hdfs), hive on this machineis configured to use mysql database on other host. Server is launched by"hive --service metastore" command.

While running server I get worrying logs (which maybe causes this problem):

10/05/25 16:38:26 WARN DataNucleus.MetaData: MetaData Parser encounteredan error in file"jar:file:/usr/lib/hive/lib/hive-metastore-0.5.0.jar!/package.jdo" atline 4, column 6 : cvc-elt.1: Cannot find the declaration of element'jdo'. - Please check your specification of DTD and the validity of theMetaData XML that you have specified.10/05/25 16:38:26 WARN DataNucleus.MetaData: MetaData Parser encounteredan error in file"jar:file:/usr/lib/hive/lib/hive-metastore-0.5.0.jar!/package.jdo" atline 282, column 13 : The content of element type "class" must match"(extension*,implements*,datastore-identity?,primary-key?,inheritance?,version?,join*,foreign-key*,index*,unique*,column*,field*,property*,query*,fetch-group*,extension*)".- Please check your specification of DTD and the validity of theMetaData XML that you have specified.

We have also hive client, which is configured to use hive servermentioned above.


When launching hive on hive client:

query:
LOAD DATA LOCAL INPATH 'file' INTO TABLE wh_im_status PARTITION
(day='2010-10-02');

response:
Copying data from file: file
Loading data to table wh_im_status partition {day=2010-10-11}
Failed with exception org.apache.thrift.TApplicationException:
get_partition failed: unknown result
FAILED: Execution Error, return code 1 from
org.apache.hadoop.hive.ql.exec.MoveTask

server logs while executing above query:
10/05/24 19:39:14 INFO metastore.HiveMetaStore: 1: get_table :
db=default tbl=wh_im_status
10/05/24 19:39:14 INFO metastore.HiveMetaStore: 1: get_partition :
db=default tbl=wh_im_status

This problem appears also when using python hive library from client host.

When I launch hive on hive server host the same query as quoted aboveworks fine.When I create table wh_im_status2 without partitions, loading data worksfine, so this problem is stricly connected with pushing data topartitions through thrift.

Our main goal is to load partitioned data from remote hosts into hadoophive. Maybe you are reaching that goal in another way?





Full configuration:

Server:

<configuration>
<property>
  <name>hive.exec.scratchdir</name>
  <value>/tmp/hive-${user.name}</value>
  <description>Scratch space for Hive jobs</description>
</property>

<property>
  <name>javax.jdo.option.ConnectionURL</name>

<value>jdbc:mysql://hive-test-db-1.test/hive_metastore?createDatabaseIfNotExist=true</value>
  <description>JDBC connect string for a JDBC metastore</description>
</property>

<property>
  <name>javax.jdo.option.ConnectionDriverName</name>
  <value>com.mysql.jdbc.Driver</value>
  <description>Driver class name for a JDBC metastore</description>
</property>

<property>
  <name>javax.jdo.option.ConnectionUserName</name>
  <value>hive_test</value>
</property>

<property>
  <name>javax.jdo.option.ConnectionPassword</name>
  <value>hive_test</value>
</property>

<property>
  <name>hive.metastore.metadb.dir</name>
  <value>file:///var/metastore/metadb/</value>
  <description>The location of filestore metadata base dir</description>
</property>

<property>
  <name>hive.metastore.uris</name>
  <value>file:///var/lib/hivevar/metastore/metadb/</value>

<description>Comma separated list of URIs of metastore servers. Thefirst server that can be connected to will be used.</description>

</property>

<property>
  <name>hive.metastore.warehouse.dir</name>
  <value>/var/warehouse</value>
  <description>location of default database for the warehouse</description>
</property>

<property>
  <name>hive.metastore.connect.retries</name>
  <value>5</value>

<description>Number of retries while opening a connection tometastore</description>

</property>

<property>
  <name>hive.metastore.rawstore.impl</name>
  <value>org.apache.hadoop.hive.metastore.ObjectStore</value>

<description>Name of the class that implementsorg.apache.hadoop.hive.metastore.rawstore interface. This class is usedto store and retrieval of raw metadata objects such as table,database</description>

</property>

<property>
  <name>hive.default.fileformat</name>
  <value>TextFile</value>

<description>Default file format for CREATE TABLE statement. Optionsare TextFile and SequenceFile. Users can explicitly say CREATE TABLE ...STORED AS <TEXTFILE|SEQUENCEFILE> to override</description>

</property>

<property>
  <name>hive.map.aggr</name>
  <value>false</value>

<description>Whether to use map-side aggregation in Hive Group Byqueries</description>

</property>

<property>
  <name>hive.join.emit.interval</name>
  <value>1000</value>

<description>How many rows in the right-most join operand Hive shouldbuffer before emitting the join result. </description>

</property>

<property>
  <name>hive.exec.script.maxerrsize</name>
  <value>100000</value>

<description>Maximum number of bytes a script is allowed to emit tostandard error (per map-reduce task). This prevents runaway scripts fromfilling logs partitions to capacity </description>

</property>

<property>
  <name>hive.exec.compress.output</name>
  <value>false</value>

<description> This controls whether the final outputs of a query (toa local/hdfs file or a hive table) is compressed. The compression codecand other options are determined from hadoop config variablesmapred.output.compress* </description>

</property>

<property>
  <name>hive.exec.compress.intermediate</name>
  <value>false</value>

<description> This controls whether intermediate files produced byhive between multiple map-reduce jobs are compressed. The compressioncodec and other options are determined from hadoop config variablesmapred.output.compress* </description>

</property>

<property>
  <name>hive.hwi.listen.host</name>
  <value>0.0.0.0</value>

<description>This is the host address the Hive Web Interface willlisten on</description>

</property>

<property>
  <name>hive.hwi.listen.port</name>
  <value>9999</value>

<description>This is the port the Hive Web Interface will listenon</description>

</property>

<property>
  <name>hive.hwi.war.file</name>
  <value>/usr/lib/hive/lib/hive-hwi-0.5.0.war</value>

<description>This is the WAR file with the jsp content for Hive WebInterface</description>

</property>

</configuration>



Client:

<configuration>
<property>
  <name>hive.exec.scratchdir</name>
  <value>/tmp/hive-${user.name}</value>
  <description>Scratch space for Hive jobs</description>
</property>

<property>
  <name>hive.metastore.local</name>
  <value>false</value>

<description>controls whether to connect to remove metastore serveror open a new metastore server in Hive Client JVM</description>

</property>

<property>
  <name>hive.metastore.metadb.dir</name>
  <value>file:///var/metastore/metadb/</value>
  <description>The location of filestore metadata base dir</description>
</property>

<property>
  <name>hive.metastore.uris</name>
  <value>thrift://test-storage-1.atm:9083</value>