Re: Problem with load to hive from remote machine

Edward Capriolo Tue, 25 May 2010 10:24:36 -0700

On Tue, May 25, 2010 at 12:09 PM, Aleksander Siewierski / Gadu-Gadu <
[email protected]> wrote:


> More precisely:
>
> We have hive server (which is also host with hdfs), hive on this machine is
> configured to use mysql database on other host. Server is launched by "hive
> --service metastore" command.
> While running server I get worrying logs (which maybe causes this problem):
> 10/05/25 16:38:26 WARN DataNucleus.MetaData: MetaData Parser encountered an
> error in file
> "jar:file:/usr/lib/hive/lib/hive-metastore-0.5.0.jar!/package.jdo" at line
> 4, column 6 : cvc-elt.1: Cannot find the declaration of element 'jdo'. -
> Please check your specification of DTD and the validity of the MetaData XML
> that you have specified.
> 10/05/25 16:38:26 WARN DataNucleus.MetaData: MetaData Parser encountered an
> error in file
> "jar:file:/usr/lib/hive/lib/hive-metastore-0.5.0.jar!/package.jdo" at line
> 282, column 13 : The content of element type "class" must match
> "(extension*,implements*,datastore-identity?,primary-key?,inheritance?,version?,join*,foreign-key*,index*,unique*,column*,field*,property*,query*,fetch-group*,extension*)".
> - Please check your specification of DTD and the validity of the MetaData
> XML that you have specified.
>
> We have also hive client, which is configured to use hive server mentioned
> above.
>
> When launching hive on hive client:
>
> query:
>
> LOAD DATA LOCAL INPATH 'file' INTO TABLE wh_im_status PARTITION
> (day='2010-10-02');
>
> response:
>
> Copying data from file: file
> Loading data to table wh_im_status partition {day=2010-10-11}
> Failed with exception org.apache.thrift.TApplicationException:
> get_partition failed: unknown result
> FAILED: Execution Error, return code 1 from
> org.apache.hadoop.hive.ql.exec.MoveTask
>
> server logs while executing above query:
>
> 10/05/24 19:39:14 INFO metastore.HiveMetaStore: 1: get_table :
> db=default tbl=wh_im_status
> 10/05/24 19:39:14 INFO metastore.HiveMetaStore: 1: get_partition :
> db=default tbl=wh_im_status
>
> This problem appears also when using python hive library from client host.
>
>
> When I launch hive on hive server host the same query as quoted above works
> fine.
> When I create table wh_im_status2 without partitions, loading data works
> fine, so this problem is stricly connected with pushing data to partitions
> through thrift.
>
>
> Our main goal is to load partitioned data from remote hosts into hadoop
> hive. Maybe you are reaching that goal in another way?
>
>
>
>
> Full configuration:
>
> Server:
>
> <configuration>
> <property>
>  <name>hive.exec.scratchdir</name>
>  <value>/tmp/hive-${user.name}</value>
>  <description>Scratch space for Hive jobs</description>
> </property>
>
> <property>
>  <name>javax.jdo.option.ConnectionURL</name>
>
>
> <value>jdbc:mysql://hive-test-db-1.test/hive_metastore?createDatabaseIfNotExist=true</value>
>  <description>JDBC connect string for a JDBC metastore</description>
> </property>
>
> <property>
>  <name>javax.jdo.option.ConnectionDriverName</name>
>  <value>com.mysql.jdbc.Driver</value>
>  <description>Driver class name for a JDBC metastore</description>
> </property>
>
> <property>
>  <name>javax.jdo.option.ConnectionUserName</name>
>  <value>hive_test</value>
> </property>
>
> <property>
>  <name>javax.jdo.option.ConnectionPassword</name>
>  <value>hive_test</value>
> </property>
>
> <property>
>  <name>hive.metastore.metadb.dir</name>
>  <value>file:///var/metastore/metadb/</value>
>  <description>The location of filestore metadata base dir</description>
> </property>
>
> <property>
>  <name>hive.metastore.uris</name>
>  <value>file:///var/lib/hivevar/metastore/metadb/</value>
>  <description>Comma separated list of URIs of metastore servers. The first
> server that can be connected to will be used.</description>
> </property>
>
> <property>
>  <name>hive.metastore.warehouse.dir</name>
>  <value>/var/warehouse</value>
>  <description>location of default database for the warehouse</description>
> </property>
>
> <property>
>  <name>hive.metastore.connect.retries</name>
>  <value>5</value>
>  <description>Number of retries while opening a connection to
> metastore</description>
> </property>
>
> <property>
>  <name>hive.metastore.rawstore.impl</name>
>  <value>org.apache.hadoop.hive.metastore.ObjectStore</value>
>  <description>Name of the class that implements
> org.apache.hadoop.hive.metastore.rawstore interface. This class is used to
> store and retrieval of raw metadata objects such as table,
> database</description>
> </property>
>
> <property>
>  <name>hive.default.fileformat</name>
>  <value>TextFile</value>
>  <description>Default file format for CREATE TABLE statement. Options are
> TextFile and SequenceFile. Users can explicitly say CREATE TABLE ... STORED
> AS &lt;TEXTFILE|SEQUENCEFILE&gt; to override</description>
> </property>
>
> <property>
>  <name>hive.map.aggr</name>
>  <value>false</value>
>  <description>Whether to use map-side aggregation in Hive Group By
> queries</description>
> </property>
>
> <property>
>  <name>hive.join.emit.interval</name>
>  <value>1000</value>
>  <description>How many rows in the right-most join operand Hive should
> buffer before emitting the join result. </description>
> </property>
>
> <property>
>  <name>hive.exec.script.maxerrsize</name>
>  <value>100000</value>
>  <description>Maximum number of bytes a script is allowed to emit to
> standard error (per map-reduce task). This prevents runaway scripts from
> filling logs partitions to capacity </description>
> </property>
>
> <property>
>  <name>hive.exec.compress.output</name>
>  <value>false</value>
>  <description> This controls whether the final outputs of a query (to a
> local/hdfs file or a hive table) is compressed. The compression codec and
> other options are determined from hadoop config variables
> mapred.output.compress* </description>
> </property>
>
> <property>
>  <name>hive.exec.compress.intermediate</name>
>  <value>false</value>
>  <description> This controls whether intermediate files produced by hive
> between multiple map-reduce jobs are compressed. The compression codec and
> other options are determined from hadoop config variables
> mapred.output.compress* </description>
> </property>
>
> <property>
>  <name>hive.hwi.listen.host</name>
>  <value>0.0.0.0</value>
>  <description>This is the host address the Hive Web Interface will listen
> on</description>
> </property>
>
> <property>
>  <name>hive.hwi.listen.port</name>
>  <value>9999</value>
>  <description>This is the port the Hive Web Interface will listen
> on</description>
> </property>
>
> <property>
>  <name>hive.hwi.war.file</name>
>  <value>/usr/lib/hive/lib/hive-hwi-0.5.0.war</value>
>  <description>This is the WAR file with the jsp content for Hive Web
> Interface</description>
> </property>
>
> </configuration>
>
>
>
> Client:
>
> <configuration>
> <property>
>  <name>hive.exec.scratchdir</name>
>  <value>/tmp/hive-${user.name}</value>
>  <description>Scratch space for Hive jobs</description>
> </property>
>
> <property>
>  <name>hive.metastore.local</name>
>  <value>false</value>
>  <description>controls whether to connect to remove metastore server or
> open a new metastore server in Hive Client JVM</description>
> </property>
>
> <property>
>  <name>hive.metastore.metadb.dir</name>
>  <value>file:///var/metastore/metadb/</value>
>  <description>The location of filestore metadata base dir</description>
> </property>
>
> <property>
>  <name>hive.metastore.uris</name>
>  <value>thrift://test-storage-1.atm:9083</value>
>  <description>Comma separated list of URIs of metastore servers. The first
> server that can be connected to will be used.</description>
> </property>
>
> <property>
>  <name>hive.metastore.warehouse.dir</name>
>  <value>hdfs://test-storage-1.atm:54310/var/warehouse</value>
>  <description>location of default database for the warehouse</description>
> </property>
>
> <property>
>  <name>hive.metastore.connect.retries</name>
>  <value>5</value>
>  <description>Number of retries while opening a connection to
> metastore</description>
> </property>
>
> <property>
>  <name>hive.metastore.rawstore.impl</name>
>  <value>org.apache.hadoop.hive.metastore.ObjectStore</value>
>  <description>Name of the class that implements
> org.apache.hadoop.hive.metastore.rawstore interface. This class is used to
> store and retrieval of raw metadata objects such as table,
> database</description>
> </property>
>
> <property>
>  <name>hive.default.fileformat</name>
>  <value>TextFile</value>
>  <description>Default file format for CREATE TABLE statement. Options are
> TextFile and SequenceFile. Users can explicitly say CREATE TABLE ... STORED
> AS &lt;TEXTFILE|SEQUENCEFILE&gt; to override</description>
> </property>
>
> <property>
>  <name>hive.map.aggr</name>
>  <value>false</value>
>  <description>Whether to use map-side aggregation in Hive Group By
> queries</description>
> </property>
>
> <property>
>  <name>hive.join.emit.interval</name>
>  <value>1000</value>
>  <description>How many rows in the right-most join operand Hive should
> buffer before emitting the join result. </description>
> </property>
>
> <property>
>  <name>hive.exec.script.maxerrsize</name>
>  <value>100000</value>
>  <description>Maximum number of bytes a script is allowed to emit to
> standard error (per map-reduce task). This prevents runaway scripts from
> filling logs partitions to capacity </description>
> </property>
>
> <property>
>  <name>hive.exec.compress.output</name>
>  <value>false</value>
>  <description> This controls whether the final outputs of a query (to a
> local/hdfs file or a hive table) is compressed. The compression codec and
> other options are determined from hadoop config variables
> mapred.output.compress* </description>
> </property>
>
> <property>
>  <name>hive.exec.compress.intermediate</name>
>  <value>false</value>
>  <description> This controls whether intermediate files produced by hive
> between multiple map-reduce jobs are compressed. The compression codec and
> other options are determined from hadoop config variables
> mapred.output.compress* </description>
> </property>
>
> <property>
>  <name>hive.hwi.listen.host</name>
>  <value>0.0.0.0</value>
>  <description>This is the host address the Hive Web Interface will listen
> on</description>
> </property>
>
> <property>
>  <name>hive.hwi.listen.port</name>
>  <value>9999</value>
>  <description>This is the port the Hive Web Interface will listen
> on</description>
> </property>
>
> <property>
>  <name>hive.hwi.war.file</name>
>  <value>/usr/lib/hive/lib/hive_hwi.war</value>
>  <description>This is the WAR file with the jsp content for Hive Web
> Interface</description>
> </property>
>
> </configuration>
>
> --
> Aleksander Siewierski
>

Our main goal is to load partitioned data from remote hosts into hadoop
hive. Maybe you are reaching that goal in another way?

You can not load data VIA hive like this.

LOAD DATA LOCAL INPATH 'XX' attempts to load data from the node launching
hive.

If your client is the CLI this works, as the CLI is running on the same node
with the data.

If your client is though the hive-service the file would have to be located
on the machine running the hive-service, not your current host.

Edward

Re: Problem with load to hive from remote machine

Reply via email to