hive> CREATE EXTERNAL TABLE collect_info (
>
> id string,
> t1 string,
> t2 string,
> t3 string,
> t4 string,
> t5 string,
> collector string)
> ROW FORMAT DELIMITED
> FIELDS TERMINATED BY '\t'
> STORED AS TEXTFILE;
OK
Time taken: 0.234 seconds
hive> load data inpath
'/group/taobao/taobao/dw/stb/20100125/collect_info/coll_9.collect_info575'
overwrite into table collect_info;
Loading data to table collect_info
Failed with exception replaceFiles: error while moving files!!!
FAILED: Execution Error, return code 1 from
org.apache.hadoop.hive.ql.exec.MoveTask
It doesn't wok.
2010/1/28 Fu Ecy <[email protected]>
> I think this is the problem, I don't have the write permissions to the
> source files/directories. Thank you, Shao :-)
>
> 2010/1/28 Zheng Shao <[email protected]>
>
> When Hive loads data from HDFS, it moves the files instead of copying the
>> files.
>>
>> That means the current user should have write permissions to the
>> source files/directories as well.
>> Can you check that?
>>
>> Zheng
>>
>> On Wed, Jan 27, 2010 at 11:18 PM, Fu Ecy <[email protected]> wrote:
>> > <property>
>> > <name>hive.metastore.warehouse.dir</name>
>> > <value>/group/tbdev/kunlun/henshao/hive/</value>
>> > <description>location of default database for the
>> warehouse</description>
>> > </property>
>> >
>> > <property>
>> > <name>hive.exec.scratchdir</name>
>> > <value>/group/tbdev/kunlun/henshao/hive/temp</value>
>> > <description>Scratch space for Hive jobs</description>
>> > </property>
>> >
>> > [kun...@gate2 ~]$ hive --config config/ -u root -p root
>> > Hive history
>> file=/tmp/kunlun/hive_job_log_kunlun_201001281514_422659187.txt
>> > hive> create table pokes (foo int, bar string);
>> > OK
>> > Time taken: 0.825 seconds
>> >
>> > Yes, I have the permission for Hive's warehouse directory and tmp
>> > directory.
>> >
>> > 2010/1/28 김영우 <[email protected]>
>> >>
>> >> Hi Fu,
>> >>
>> >> Your query seems correct but I think, It's a problem related HDFS
>> >> permission.
>> >> Did you set right permission for Hive's warehouse directory and tmp
>> >> directory?
>> >> Seems user 'kunlun' does not have WRITE permission for hive warehouse
>> >> directory.
>> >>
>> >> Youngwoo
>> >>
>> >> 2010/1/28 Fu Ecy <[email protected]>
>> >>>
>> >>> 2010-01-27 12:58:22,182 ERROR ql.Driver
>> >>> (SessionState.java:printError(303)) - FAILED: Parse Error: line 2:10
>> cannot
>> >>> recognize
>> >>> input ',' in column type
>> >>>
>> >>> org.apache.hadoop.hive.ql.parse.ParseException: line 2:10 cannot
>> >>> recognize input ',' in column type
>> >>>
>> >>> at
>> >>>
>> org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:357)
>> >>> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:249)
>> >>> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:290)
>> >>> at
>> >>> org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:163)
>> >>> at
>> >>> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:221)
>> >>> at
>> org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:335)
>> >>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> >>> at
>> >>>
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>> >>> at
>> >>>
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>> >>> at java.lang.reflect.Method.invoke(Method.java:597)
>> >>> at org.apache.hadoop.util.RunJar.main(RunJar.java:165)
>> >>> at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54)
>> >>> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>> >>> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
>> >>> at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68)
>> >>>
>> >>> 2010-01-27 12:58:40,394 ERROR hive.log
>> >>> (MetaStoreUtils.java:logAndThrowMetaException(570)) - Got exception:
>> >>> org.apache.hadoop
>> >>> .security.AccessControlException
>> >>> org.apache.hadoop.security.AccessControlException: Permission denied:
>> >>> user=kunlun, access=WR
>> >>> ITE, inode="user":hadoop:cug-admin:rwxr-xr-x
>> >>> 2010-01-27 12:58:40,395 ERROR hive.log
>> >>> (MetaStoreUtils.java:logAndThrowMetaException(571)) -
>> >>> org.apache.hadoop.security.Acces
>> >>> sControlException: org.apache.hadoop.security.AccessControlException:
>> >>> Permission denied: user=kunlun, access=WRITE, inode="us
>> >>> er":hadoop:cug-admin:rwxr-xr-x
>> >>> at
>> sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
>> >>> Method)
>> >>> at
>> >>>
>> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
>> >>> at
>> >>>
>> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
>> >>> at
>> >>> java.lang.reflect.Constructor.newInstance(Constructor.java:513)
>> >>> at
>> >>>
>> org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:96)
>> >>> at
>> >>>
>> org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:58)
>> >>> at org.apache.hadoop.hdfs.DFSClient.mkdirs(DFSClient.java:831)
>> >>> at
>> >>>
>> org.apache.hadoop.hdfs.DistributedFileSystem.mkdirs(DistributedFileSystem.java:257)
>> >>> at
>> org.apache.hadoop.fs.FileSystem.mkdirs(FileSystem.java:1118)
>> >>> at
>> >>> org.apache.hadoop.hive.metastore.Warehouse.mkdirs(Warehouse.java:123)
>> >>> at
>> >>>
>> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.create_table(HiveMetaStore.java:505)
>> >>> at
>> >>>
>> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.createTable(HiveMetaStoreClient.java:256)
>> >>> at
>> >>> org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:254)
>> >>> at
>> >>> org.apache.hadoop.hive.ql.exec.DDLTask.createTable(DDLTask.java:883)
>> >>> at
>> >>> org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:105)
>> >>> at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:388)
>> >>> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:294)
>> >>> at
>> >>> org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:163)
>> >>> at
>> >>> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:221)
>> >>> at
>> org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:335)
>> >>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> >>> at
>> >>>
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>> >>> at
>> >>>
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>> >>> at java.lang.reflect.Method.invoke(Method.java:597)
>> >>> at org.apache.hadoop.util.RunJar.main(RunJar.java:165)
>> >>> at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54)
>> >>> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>> >>> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
>> >>> at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68)
>> >>> Caused by: org.apache.hadoop.ipc.RemoteException:
>> >>> org.apache.hadoop.security.AccessControlException: Permission denied:
>> user=
>> >>> kunlun, access=WRITE, inode="user":hadoop:cug-admin:rwxr-xr-x
>> >>> at
>> >>>
>> org.apache.hadoop.hdfs.server.namenode.PermissionChecker.check(PermissionChecker.java:176)
>> >>> at
>> >>>
>> org.apache.hadoop.hdfs.server.namenode.PermissionChecker.check(PermissionChecker.java:157)
>> >>> at
>> >>>
>> org.apache.hadoop.hdfs.server.namenode.PermissionChecker.checkPermission(PermissionChecker.java:105)
>> >>> at
>> >>>
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:4400)
>> >>> at
>> >>>
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkAncestorAccess(FSNamesystem.java:4370)
>> >>> at
>> >>>
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInternal(FSNamesystem.java:1771)
>> >>> at
>> >>>
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:1740)
>> >>> at
>> >>>
>> org.apache.hadoop.hdfs.server.namenode.NameNode.mkdirs(NameNode.java:471)
>> >>> at sun.reflect.GeneratedMethodAccessor24.invoke(Unknown
>> Source)
>> >>> at
>> >>>
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>> >>> at java.lang.reflect.Method.invoke(Method.java:597)
>> >>> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:481)
>> >>> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:894)
>> >>>
>> >>> at org.apache.hadoop.ipc.Client.call(Client.java:697)
>> >>> at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216)
>> >>> at $Proxy4.mkdirs(Unknown Source)
>> >>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> >>> at
>> >>>
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>> >>> at
>> >>>
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>> >>> at java.lang.reflect.Method.invoke(Method.java:597)
>> >>> at
>> >>>
>> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
>> >>> at
>> >>>
>> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
>> >>> at $Proxy4.mkdirs(Unknown Source)
>> >>> at org.apache.hadoop.hdfs.DFSClient.mkdirs(DFSClient.java:829)
>> >>> ... 22 more
>> >>>
>> >>> Is there any problem with the input data format?
>> >>>
>> >>> CREATE TABLE collect_info (
>> >>> id string,
>> >>> t1 string,
>> >>> t2 string,
>> >>> t3 string,
>> >>> t4 string,
>> >>> t5 string,
>> >>> collector string)
>> >>> ROW FORMAT DELIMITED
>> >>> FIELDS TERMINATED BY '\t'
>> >>> STORED AS TEXTFILE;
>> >>>
>> >>> 5290086045 330952255 1 2010-01-26 02:41:27
>> >>> 0 196050201 2010-01-26 02:41:27 2010-01-26 02:41:27
>> >>> qijansher93771 0 1048
>> >>>
>> >>> Fields are separated by '\t', I want to get the fields mark by red.
>> >>>
>> >>> 2010/1/28 Eric Sammer <[email protected]>
>> >>>>
>> >>>> On 1/27/10 10:59 PM, Fu Ecy wrote:
>> >>>> > I want to load some files on HDFS to a hive table, but there is
>> >>>> > an execption as follow:
>> >>>> > hive> load data inpath
>> >>>> > '/group/taobao/taobao/dw/stb/20100125/collect_info/*' into table
>> >>>> > collect_info;
>> >>>> > Loading data to table collect_info
>> >>>> > Failed with exception addFiles: error while moving files!!!
>> >>>> > FAILED: Execution Error, return code 1 from
>> >>>> > org.apache.hadoop.hive.ql.exec.MoveTask
>> >>>> >
>> >>>> > But, when I download the files from HDFS to local machine, then
>> load
>> >>>> > them into the table, it works.
>> >>>> > Data in '/group/taobao/taobao/dw/stb/20100125/collect_info/*' is a
>> >>>> > little more than 200GB.
>> >>>> >
>> >>>> > I need to use the Hive to make some statistics.
>> >>>> > much thanks :-)
>> >>>>
>> >>>> The size of the files shouldn't really matter (move operations affect
>> >>>> metadata only - the blocks aren't rewritten or anything like that).
>> >>>> Check in your Hive log files (by default in /tmp/<user>/hive.log on
>> the
>> >>>> local machine you run Hive on, I believe) and you should see a stack
>> >>>> trace with additional information.
>> >>>>
>> >>>> Regards.
>> >>>> --
>> >>>> Eric Sammer
>> >>>> [email protected]
>> >>>> http://esammer.blogspot.com
>> >>>
>> >>
>> >
>> >
>>
>>
>>
>> --
>> Yours,
>> Zheng
>>
>
>