Did you try to ask Hive merge these small files when run "insert
overwrite²? You can refer to
http://grokbase.com/t/hive/user/1086nyhx7q/how-to-merge-small-files


On 7/3/15, 5:13 PM, "[email protected]" <[email protected]> wrote:

>Hi,
>        When i build kylin cube , it failed at step 3 with error logs on
>webui as follows:
>        Start to execute command:
>        com.kylinolap.job.hadoop.dict.CreateDictionaryJob  -cubename
>kylin_test11 -segmentname FULL_BUILD -input
>/home/kylin/kylin-0175616e-50c9-44de-834d-404b1bca113a/kylin_test11/fact_d
>istinct_columns
>        Command execute return code 2
>
>        And, logs from tomcat are as follows:
>        [QuartzScheduler_Worker-9]:[2015-07-03
>15:49:18,378][INFO][com.kylinolap.common.persistence.HBaseResourceStore.ge
>tTableName(HBaseResourceStore.java:108)] -
>/job/7cd45ce1-b7cf-4c9d-a2fd-f8eecbbf900d getTableName Get Table name
>project_metadata_job
>java.io.IOException: Before,get putResource trace.
>        at 
>com.kylinolap.common.persistence.ResourceStore.putResource(ResourceStore.j
>ava:171)
>        at com.kylinolap.job.JobDAO.writeJobResource(JobDAO.java:240)
>        at com.kylinolap.job.JobDAO.saveJob(JobDAO.java:161)
>        at com.kylinolap.job.JobDAO.updateJobInstance(JobDAO.java:199)
>        at 
>com.kylinolap.job.flow.JobFlowNode.updateJobStep(JobFlowNode.java:134)
>        at com.kylinolap.job.flow.JobFlowNode.execute(JobFlowNode.java:71)
>        at org.quartz.core.JobRunShell.run(JobRunShell.java:202)
>        at 
>org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:5
>73)
>[QuartzScheduler_Worker-9]:[2015-07-03
>15:49:18,382][DEBUG][com.kylinolap.common.persistence.ResourceStore.putRes
>ource(ResourceStore.java:176)] - Saving resource
>/job/7cd45ce1-b7cf-4c9d-a2fd-f8eecbbf900d (Store
>project_metadata@hbase:182.118.45.55#182.118.45.56#182.118.45.57:2181:/hba
>se).oldTS:1435909715144,newTS:1435909758382
>[QuartzScheduler_Worker-9]:[2015-07-03
>15:49:18,382][INFO][com.kylinolap.common.persistence.HBaseResourceStore.ge
>tTableName(HBaseResourceStore.java:108)] -
>/job/7cd45ce1-b7cf-4c9d-a2fd-f8eecbbf900d getTableName Get Table name
>project_metadata_job
>java.io.IOException: After,get putResource trace.
>        at 
>com.kylinolap.common.persistence.ResourceStore.putResource(ResourceStore.j
>ava:190)
>        at com.kylinolap.job.JobDAO.writeJobResource(JobDAO.java:240)
>        at com.kylinolap.job.JobDAO.saveJob(JobDAO.java:161)
>        at com.kylinolap.job.JobDAO.updateJobInstance(JobDAO.java:199)
>        at 
>com.kylinolap.job.flow.JobFlowNode.updateJobStep(JobFlowNode.java:134)
>        at com.kylinolap.job.flow.JobFlowNode.execute(JobFlowNode.java:71)
>        at org.quartz.core.JobRunShell.run(JobRunShell.java:202)
>        at 
>org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:5
>73)
>[QuartzScheduler_Worker-9]:[2015-07-03
>15:49:18,390][DEBUG][com.kylinolap.job.cmd.JavaHadoopCmdOutput.appendOutpu
>t(JavaHadoopCmdOutput.java:96)] - Start to execute command:
>com.kylinolap.job.hadoop.dict.CreateDictionaryJob  -cubename kylin_test9
>-segmentname FULL_BUILD -input
>/home/kylin/kylin-7cd45ce1-b7cf-4c9d-a2fd-f8eecbbf900d/kylin_test9/fact_di
>stinct_columns
>[QuartzScheduler_Worker-9]:[2015-07-03
>15:49:18,398][INFO][com.kylinolap.cube.cli.DictionaryGeneratorCLI.processS
>egment(DictionaryGeneratorCLI.java:57)] - Building snapshot of
>KYLIN_DIMENSION_RECHARGE_CHANNEL
>usage: CreateDictionaryJob
> -cubename <name>      Cube name. For exmaple, flat_item_cube
> -input <path>         Input path
> -segmentname <name>   Cube segment name)
>java.lang.IllegalStateException: Expect 1 and only 1 non-zero file under
>hdfs://namenode:9000/home/xitong/test-logsget-cube/KYLIN_DIMENSION_RECHARG
>E_CHANNEL, but find 19
>        at 
>com.kylinolap.dict.lookup.HiveTable.findOnlyFile(HiveTable.java:119)
>        at 
>com.kylinolap.dict.lookup.HiveTable.computeHDFSLocation(HiveTable.java:104
>)
>        at 
>com.kylinolap.dict.lookup.HiveTable.getHDFSLocation(HiveTable.java:79)
>        at 
>com.kylinolap.dict.lookup.HiveTable.getFileTable(HiveTable.java:72)
>        at 
>com.kylinolap.dict.lookup.HiveTable.getSignature(HiveTable.java:67)
>        at 
>com.kylinolap.dict.lookup.SnapshotTable.<init>(SnapshotTable.java:53)
>        at 
>com.kylinolap.dict.lookup.SnapshotManager.buildSnapshot(SnapshotManager.ja
>va:85)
>        at 
>com.kylinolap.cube.CubeManager.buildSnapshotTable(CubeManager.java:210)
>        at 
>com.kylinolap.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGen
>eratorCLI.java:58)
>        at 
>com.kylinolap.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGen
>eratorCLI.java:39)
>        at 
>com.kylinolap.job.hadoop.dict.CreateDictionaryJob.run(CreateDictionaryJob.
>java:51)
>        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
>        at 
>com.kylinolap.job.cmd.JavaHadoopCmd.execute(JavaHadoopCmd.java:53)
>        at com.kylinolap.job.flow.JobFlowNode.execute(JobFlowNode.java:77)
>        at org.quartz.core.JobRunShell.run(JobRunShell.java:202)
>        at 
>org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:5
>73)
>[QuartzScheduler_Worker-9]:[2015-07-03
>15:49:25,927][ERROR][com.kylinolap.job.hadoop.dict.CreateDictionaryJob.run
>(CreateDictionaryJob.java:55)] - Expect 1 and only 1 non-zero file under
>hdfs://w-namenodefd2v.qss.zzbc2.qihoo.net:9000/home/xitong/dongtingting/te
>st-logsget-cube/KYLIN_DIMENSION_RECHARGE_CHANNEL, but find 19
>java.lang.IllegalStateException: Expect 1 and only 1 non-zero file under
>hdfs://namenode:9000/home/xitong/test-logsget-cube/KYLIN_DIMENSION_RECHARG
>E_CHANNEL, but find 19
>        at 
>com.kylinolap.dict.lookup.HiveTable.findOnlyFile(HiveTable.java:119)
>        at 
>com.kylinolap.dict.lookup.HiveTable.computeHDFSLocation(HiveTable.java:104
>)
>        at 
>com.kylinolap.dict.lookup.HiveTable.getHDFSLocation(HiveTable.java:79)
>        at 
>com.kylinolap.dict.lookup.HiveTable.getFileTable(HiveTable.java:72)
>        at 
>com.kylinolap.dict.lookup.HiveTable.getSignature(HiveTable.java:67)
>        at 
>com.kylinolap.dict.lookup.SnapshotTable.<init>(SnapshotTable.java:53)
>        at 
>com.kylinolap.dict.lookup.SnapshotManager.buildSnapshot(SnapshotManager.ja
>va:85)
>        at 
>com.kylinolap.cube.CubeManager.buildSnapshotTable(CubeManager.java:210)
>        at 
>com.kylinolap.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGen
>eratorCLI.java:58)
>        at 
>com.kylinolap.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGen
>eratorCLI.java:39)
>        at 
>com.kylinolap.job.hadoop.dict.CreateDictionaryJob.run(CreateDictionaryJob.
>java:51)
>        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
>        at 
>com.kylinolap.job.cmd.JavaHadoopCmd.execute(JavaHadoopCmd.java:53)
>        at com.kylinolap.job.flow.JobFlowNode.execute(JobFlowNode.java:77)
>        at org.quartz.core.JobRunShell.run(JobRunShell.java:202)
>        at 
>org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:5
>73)
>[QuartzScheduler_Worker-9]:[2015-07-03
>15:49:25,928][DEBUG][com.kylinolap.job.cmd.JavaHadoopCmdOutput.appendOutpu
>t(JavaHadoopCmdOutput.java:96)] - Command execute return code 2
>[QuartzScheduler_Worker-9]:[2015-07-03
>15:49:25,928][INFO][com.kylinolap.common.persistence.HBaseResourceStore.ge
>tTableName(HBaseResourceStore.java:108)] -
>/job/7cd45ce1-b7cf-4c9d-a2fd-f8eecbbf900d getTableName Get Table name
>project_metadata_job
>
>        It seems like i got too many file in that hive table, cause my
>dimension tables were generated from another full_source_table by create
>external table xxx location xxx and then insert overwrite table xxx
>select from full_table. Thus it has generated many files out of my
>control.
>        Can you help me with this problem?
>        Thank you very much.
>
>
>
>[email protected]

Reply via email to