Much thanks for all :-) 2010/1/28 Zheng Shao <[email protected]>
> Please see http://wiki.apache.org/hadoop/Hive/LanguageManual/DDL for > how to use "External" table. > You don't need to "load" into external table because external table > can directly point to your data directory. > > Zheng > > On Wed, Jan 27, 2010 at 11:38 PM, Fu Ecy <[email protected]> wrote: > > hive> CREATE EXTERNAL TABLE collect_info ( > > > > > > id string, > > > t1 string, > > > t2 string, > > > t3 string, > > > t4 string, > > > t5 string, > > > collector string) > > > ROW FORMAT DELIMITED > > > FIELDS TERMINATED BY '\t' > > > STORED AS TEXTFILE; > > OK > > Time taken: 0.234 seconds > > > > hive> load data inpath > > > '/group/taobao/taobao/dw/stb/20100125/collect_info/coll_9.collect_info575' > > overwrite into table collect_info; > > Loading data to table collect_info > > Failed with exception replaceFiles: error while moving files!!! > > FAILED: Execution Error, return code 1 from > > org.apache.hadoop.hive.ql.exec.MoveTask > > > > It doesn't wok. > > > > 2010/1/28 Fu Ecy <[email protected]> > >> > >> I think this is the problem, I don't have the write permissions to the > >> source files/directories. Thank you, Shao :-) > >> > >> 2010/1/28 Zheng Shao <[email protected]> > >>> > >>> When Hive loads data from HDFS, it moves the files instead of copying > the > >>> files. > >>> > >>> That means the current user should have write permissions to the > >>> source files/directories as well. > >>> Can you check that? > >>> > >>> Zheng > >>> > >>> On Wed, Jan 27, 2010 at 11:18 PM, Fu Ecy <[email protected]> > wrote: > >>> > <property> > >>> > <name>hive.metastore.warehouse.dir</name> > >>> > <value>/group/tbdev/kunlun/henshao/hive/</value> > >>> > <description>location of default database for the > >>> > warehouse</description> > >>> > </property> > >>> > > >>> > <property> > >>> > <name>hive.exec.scratchdir</name> > >>> > <value>/group/tbdev/kunlun/henshao/hive/temp</value> > >>> > <description>Scratch space for Hive jobs</description> > >>> > </property> > >>> > > >>> > [kun...@gate2 ~]$ hive --config config/ -u root -p root > >>> > Hive history > >>> > file=/tmp/kunlun/hive_job_log_kunlun_201001281514_422659187.txt > >>> > hive> create table pokes (foo int, bar string); > >>> > OK > >>> > Time taken: 0.825 seconds > >>> > > >>> > Yes, I have the permission for Hive's warehouse directory and tmp > >>> > directory. > >>> > > >>> > 2010/1/28 김영우 <[email protected]> > >>> >> > >>> >> Hi Fu, > >>> >> > >>> >> Your query seems correct but I think, It's a problem related HDFS > >>> >> permission. > >>> >> Did you set right permission for Hive's warehouse directory and tmp > >>> >> directory? > >>> >> Seems user 'kunlun' does not have WRITE permission for hive > warehouse > >>> >> directory. > >>> >> > >>> >> Youngwoo > >>> >> > >>> >> 2010/1/28 Fu Ecy <[email protected]> > >>> >>> > >>> >>> 2010-01-27 12:58:22,182 ERROR ql.Driver > >>> >>> (SessionState.java:printError(303)) - FAILED: Parse Error: line > 2:10 > >>> >>> cannot > >>> >>> recognize > >>> >>> input ',' in column type > >>> >>> > >>> >>> org.apache.hadoop.hive.ql.parse.ParseException: line 2:10 cannot > >>> >>> recognize input ',' in column type > >>> >>> > >>> >>> at > >>> >>> > >>> >>> > org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:357) > >>> >>> at > org.apache.hadoop.hive.ql.Driver.compile(Driver.java:249) > >>> >>> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:290) > >>> >>> at > >>> >>> org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:163) > >>> >>> at > >>> >>> > org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:221) > >>> >>> at > >>> >>> org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:335) > >>> >>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native > >>> >>> Method) > >>> >>> at > >>> >>> > >>> >>> > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > >>> >>> at > >>> >>> > >>> >>> > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > >>> >>> at java.lang.reflect.Method.invoke(Method.java:597) > >>> >>> at org.apache.hadoop.util.RunJar.main(RunJar.java:165) > >>> >>> at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54) > >>> >>> at > org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) > >>> >>> at > org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79) > >>> >>> at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68) > >>> >>> > >>> >>> 2010-01-27 12:58:40,394 ERROR hive.log > >>> >>> (MetaStoreUtils.java:logAndThrowMetaException(570)) - Got > exception: > >>> >>> org.apache.hadoop > >>> >>> .security.AccessControlException > >>> >>> org.apache.hadoop.security.AccessControlException: Permission > denied: > >>> >>> user=kunlun, access=WR > >>> >>> ITE, inode="user":hadoop:cug-admin:rwxr-xr-x > >>> >>> 2010-01-27 12:58:40,395 ERROR hive.log > >>> >>> (MetaStoreUtils.java:logAndThrowMetaException(571)) - > >>> >>> org.apache.hadoop.security.Acces > >>> >>> sControlException: > org.apache.hadoop.security.AccessControlException: > >>> >>> Permission denied: user=kunlun, access=WRITE, inode="us > >>> >>> er":hadoop:cug-admin:rwxr-xr-x > >>> >>> at > >>> >>> sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native > >>> >>> Method) > >>> >>> at > >>> >>> > >>> >>> > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39) > >>> >>> at > >>> >>> > >>> >>> > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) > >>> >>> at > >>> >>> java.lang.reflect.Constructor.newInstance(Constructor.java:513) > >>> >>> at > >>> >>> > >>> >>> > org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:96) > >>> >>> at > >>> >>> > >>> >>> > org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:58) > >>> >>> at > >>> >>> org.apache.hadoop.hdfs.DFSClient.mkdirs(DFSClient.java:831) > >>> >>> at > >>> >>> > >>> >>> > org.apache.hadoop.hdfs.DistributedFileSystem.mkdirs(DistributedFileSystem.java:257) > >>> >>> at > >>> >>> org.apache.hadoop.fs.FileSystem.mkdirs(FileSystem.java:1118) > >>> >>> at > >>> >>> > org.apache.hadoop.hive.metastore.Warehouse.mkdirs(Warehouse.java:123) > >>> >>> at > >>> >>> > >>> >>> > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.create_table(HiveMetaStore.java:505) > >>> >>> at > >>> >>> > >>> >>> > org.apache.hadoop.hive.metastore.HiveMetaStoreClient.createTable(HiveMetaStoreClient.java:256) > >>> >>> at > >>> >>> org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:254) > >>> >>> at > >>> >>> > org.apache.hadoop.hive.ql.exec.DDLTask.createTable(DDLTask.java:883) > >>> >>> at > >>> >>> org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:105) > >>> >>> at > org.apache.hadoop.hive.ql.Driver.execute(Driver.java:388) > >>> >>> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:294) > >>> >>> at > >>> >>> org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:163) > >>> >>> at > >>> >>> > org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:221) > >>> >>> at > >>> >>> org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:335) > >>> >>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native > >>> >>> Method) > >>> >>> at > >>> >>> > >>> >>> > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > >>> >>> at > >>> >>> > >>> >>> > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > >>> >>> at java.lang.reflect.Method.invoke(Method.java:597) > >>> >>> at org.apache.hadoop.util.RunJar.main(RunJar.java:165) > >>> >>> at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54) > >>> >>> at > org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) > >>> >>> at > org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79) > >>> >>> at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68) > >>> >>> Caused by: org.apache.hadoop.ipc.RemoteException: > >>> >>> org.apache.hadoop.security.AccessControlException: Permission > denied: > >>> >>> user= > >>> >>> kunlun, access=WRITE, inode="user":hadoop:cug-admin:rwxr-xr-x > >>> >>> at > >>> >>> > >>> >>> > org.apache.hadoop.hdfs.server.namenode.PermissionChecker.check(PermissionChecker.java:176) > >>> >>> at > >>> >>> > >>> >>> > org.apache.hadoop.hdfs.server.namenode.PermissionChecker.check(PermissionChecker.java:157) > >>> >>> at > >>> >>> > >>> >>> > org.apache.hadoop.hdfs.server.namenode.PermissionChecker.checkPermission(PermissionChecker.java:105) > >>> >>> at > >>> >>> > >>> >>> > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:4400) > >>> >>> at > >>> >>> > >>> >>> > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkAncestorAccess(FSNamesystem.java:4370) > >>> >>> at > >>> >>> > >>> >>> > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInternal(FSNamesystem.java:1771) > >>> >>> at > >>> >>> > >>> >>> > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:1740) > >>> >>> at > >>> >>> > >>> >>> > org.apache.hadoop.hdfs.server.namenode.NameNode.mkdirs(NameNode.java:471) > >>> >>> at sun.reflect.GeneratedMethodAccessor24.invoke(Unknown > >>> >>> Source) > >>> >>> at > >>> >>> > >>> >>> > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > >>> >>> at java.lang.reflect.Method.invoke(Method.java:597) > >>> >>> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:481) > >>> >>> at > org.apache.hadoop.ipc.Server$Handler.run(Server.java:894) > >>> >>> > >>> >>> at org.apache.hadoop.ipc.Client.call(Client.java:697) > >>> >>> at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216) > >>> >>> at $Proxy4.mkdirs(Unknown Source) > >>> >>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native > >>> >>> Method) > >>> >>> at > >>> >>> > >>> >>> > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > >>> >>> at > >>> >>> > >>> >>> > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > >>> >>> at java.lang.reflect.Method.invoke(Method.java:597) > >>> >>> at > >>> >>> > >>> >>> > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82) > >>> >>> at > >>> >>> > >>> >>> > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59) > >>> >>> at $Proxy4.mkdirs(Unknown Source) > >>> >>> at > >>> >>> org.apache.hadoop.hdfs.DFSClient.mkdirs(DFSClient.java:829) > >>> >>> ... 22 more > >>> >>> > >>> >>> Is there any problem with the input data format? > >>> >>> > >>> >>> CREATE TABLE collect_info ( > >>> >>> id string, > >>> >>> t1 string, > >>> >>> t2 string, > >>> >>> t3 string, > >>> >>> t4 string, > >>> >>> t5 string, > >>> >>> collector string) > >>> >>> ROW FORMAT DELIMITED > >>> >>> FIELDS TERMINATED BY '\t' > >>> >>> STORED AS TEXTFILE; > >>> >>> > >>> >>> 5290086045 330952255 1 2010-01-26 02:41:27 > >>> >>> 0 196050201 2010-01-26 02:41:27 2010-01-26 02:41:27 > >>> >>> qijansher93771 0 1048 > >>> >>> > >>> >>> Fields are separated by '\t', I want to get the fields mark by red. > >>> >>> > >>> >>> 2010/1/28 Eric Sammer <[email protected]> > >>> >>>> > >>> >>>> On 1/27/10 10:59 PM, Fu Ecy wrote: > >>> >>>> > I want to load some files on HDFS to a hive table, but there is > >>> >>>> > an execption as follow: > >>> >>>> > hive> load data inpath > >>> >>>> > '/group/taobao/taobao/dw/stb/20100125/collect_info/*' into table > >>> >>>> > collect_info; > >>> >>>> > Loading data to table collect_info > >>> >>>> > Failed with exception addFiles: error while moving files!!! > >>> >>>> > FAILED: Execution Error, return code 1 from > >>> >>>> > org.apache.hadoop.hive.ql.exec.MoveTask > >>> >>>> > > >>> >>>> > But, when I download the files from HDFS to local machine, then > >>> >>>> > load > >>> >>>> > them into the table, it works. > >>> >>>> > Data in '/group/taobao/taobao/dw/stb/20100125/collect_info/*' is > a > >>> >>>> > little more than 200GB. > >>> >>>> > > >>> >>>> > I need to use the Hive to make some statistics. > >>> >>>> > much thanks :-) > >>> >>>> > >>> >>>> The size of the files shouldn't really matter (move operations > >>> >>>> affect > >>> >>>> metadata only - the blocks aren't rewritten or anything like > that). > >>> >>>> Check in your Hive log files (by default in /tmp/<user>/hive.log > on > >>> >>>> the > >>> >>>> local machine you run Hive on, I believe) and you should see a > stack > >>> >>>> trace with additional information. > >>> >>>> > >>> >>>> Regards. > >>> >>>> -- > >>> >>>> Eric Sammer > >>> >>>> [email protected] > >>> >>>> http://esammer.blogspot.com > >>> >>> > >>> >> > >>> > > >>> > > >>> > >>> > >>> > >>> -- > >>> Yours, > >>> Zheng > >> > > > > > > > > -- > Yours, > Zheng >
