[
https://issues.apache.org/jira/browse/TEZ-3403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ashish Kumar updated TEZ-3403:
------------------------------
Description:
Hi,
I'm experiencing few failures with TEZ regarding Hive partitions. Even though
there is no partition column used in the query still it is giving partition
file path not found error.
I'm trying to run below query with Hive on TEZ and getting some partition
issue. The same query is working fine with MR engine. Used table is external
one and having partitions on year and month columns. I've seen few times
*Query:*
*select count(crn) as bookings,
month(to_date(from_utc_timestamp(pickup_date,'IST'))) as month from
bookings_table and year=2016 group by
month(to_date(from_utc_timestamp(pickup_date,'IST')));*
*Error:*
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException:
java.io.IOException: java.io.IOException: While processing file
s3n://<bucket>/warehouse/bookings_table/year=2016/month=1. null
at
org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:78)
at
org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:292)
at
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:148)
... 14 more
Caused by: java.io.IOException: java.io.IOException: While processing file
s3n://<bucket>/warehouse/bookings_table/year=2016/month=1. null
at
org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121)
at
org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77)
at
org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:372)
at
org.apache.hadoop.hive.ql.io.HiveRecordReader.doNext(HiveRecordReader.java:79)
at
org.apache.hadoop.hive.ql.io.HiveRecordReader.doNext(HiveRecordReader.java:33)
at
org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.next(HiveContextAwareRecordReader.java:118)
at
org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.next(TezGroupedSplitsInputFormat.java:137)
at org.apache.tez.mapreduce.lib.MRReaderMapred.next(MRReaderMapred.java:113)
at
org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:68)
... 16 more
*Another error for other query:*
DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:4
FAILED: Execution Error, return code 2 from
org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, vertexName=Map 1,
vertexId=vertex_1470240409111_2339_1_06, diagnostics=[Vertex
vertex_1470240409111_2339_1_06 [Map 1] killed/failed due
to:ROOT_INPUT_INIT_FAILURE, Vertex Input: app_sessions initializer failed,
vertex=vertex_1470240409111_2339_1_06 [Map 1], java.io.FileNotFoundException:
No such file or directory:
s3n://<bucket>/warehouse/<table>/year=2015/month=02/day=14/hour=03
at
org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:1078)
at org.apache.hadoop.fs.s3a.S3AFileSystem.listStatus(S3AFileSystem.java:783)
at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1500)
at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1540)
at org.apache.hadoop.fs.FileSystem$4.(FileSystem.java:1704)
at org.apache.hadoop.fs.FileSystem.listLocatedStatus(FileSystem.java:1703)
at
org.apache.hadoop.mapred.InputPathProcessor.perPathComputation(InputPathProcessor.java:235)
at
org.apache.hadoop.mapred.InputPathProcessor.access$000(InputPathProcessor.java:28)
at
org.apache.hadoop.mapred.InputPathProcessor$2.run(InputPathProcessor.java:338)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
was:
Hi,
I'm experiencing few failures with TEZ regarding Hive partitions. Even though
there is no partition column used in the query still it is giving partition
file path not found error.
I'm trying to run below query with Hive on TEZ and getting some partition
issue. The same query is working fine with MR engine. Used table is external
one and having partitions on year and month columns. I've seen few times
*Query:*
*select count(crn) as bookings,
month(to_date(from_utc_timestamp(pickup_date,'IST'))) as month from
bookings_table and year=2016 group by
month(to_date(from_utc_timestamp(pickup_date,'IST')));*
*Error:*
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException:
java.io.IOException: java.io.IOException: While processing file
s3n://<bucket>/warehouse/bookings_table/year=2016/month=1. null
at
org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:78)
at
org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:292)
at
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:148)
... 14 more
Caused by: java.io.IOException: java.io.IOException: While processing file
s3n://<bucket>/warehouse/bookings_table/year=2016/month=1. null
at
org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121)
at
org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77)
at
org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:372)
at
org.apache.hadoop.hive.ql.io.HiveRecordReader.doNext(HiveRecordReader.java:79)
at
org.apache.hadoop.hive.ql.io.HiveRecordReader.doNext(HiveRecordReader.java:33)
at
org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.next(HiveContextAwareRecordReader.java:118)
at
org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.next(TezGroupedSplitsInputFormat.java:137)
at org.apache.tez.mapreduce.lib.MRReaderMapred.next(MRReaderMapred.java:113)
at
org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:68)
... 16 more
*Another error for other query:*
DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:4
FAILED: Execution Error, return code 2 from
org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, vertexName=Map 1,
vertexId=vertex_1470240409111_2339_1_06, diagnostics=[Vertex
vertex_1470240409111_2339_1_06 [Map 1] killed/failed due
to:ROOT_INPUT_INIT_FAILURE, Vertex Input: app_sessions initializer failed,
vertex=vertex_1470240409111_2339_1_06 [Map 1], java.io.FileNotFoundException:
No such file or directory:
s3n://dataplatform-prod-store/warehouse/dp_appSessions/year=2015/month=02/day=14/hour=03
at
org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:1078)
at org.apache.hadoop.fs.s3a.S3AFileSystem.listStatus(S3AFileSystem.java:783)
at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1500)
at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1540)
at org.apache.hadoop.fs.FileSystem$4.(FileSystem.java:1704)
at org.apache.hadoop.fs.FileSystem.listLocatedStatus(FileSystem.java:1703)
at
org.apache.hadoop.mapred.InputPathProcessor.perPathComputation(InputPathProcessor.java:235)
at
org.apache.hadoop.mapred.InputPathProcessor.access$000(InputPathProcessor.java:28)
at
org.apache.hadoop.mapred.InputPathProcessor$2.run(InputPathProcessor.java:338)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
> Empty partition issue with Hive on TEZ
> --------------------------------------
>
> Key: TEZ-3403
> URL: https://issues.apache.org/jira/browse/TEZ-3403
> Project: Apache Tez
> Issue Type: Bug
> Reporter: Ashish Kumar
>
> Hi,
> I'm experiencing few failures with TEZ regarding Hive partitions. Even though
> there is no partition column used in the query still it is giving partition
> file path not found error.
> I'm trying to run below query with Hive on TEZ and getting some partition
> issue. The same query is working fine with MR engine. Used table is external
> one and having partitions on year and month columns. I've seen few times
> *Query:*
> *select count(crn) as bookings,
> month(to_date(from_utc_timestamp(pickup_date,'IST'))) as month from
> bookings_table and year=2016 group by
> month(to_date(from_utc_timestamp(pickup_date,'IST')));*
> *Error:*
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException:
> java.io.IOException: java.io.IOException: While processing file
> s3n://<bucket>/warehouse/bookings_table/year=2016/month=1. null
> at
> org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:78)
>
> at
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:292)
>
> at
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:148)
>
> ... 14 more
> Caused by: java.io.IOException: java.io.IOException: While processing file
> s3n://<bucket>/warehouse/bookings_table/year=2016/month=1. null
> at
> org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121)
>
> at
> org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77)
>
> at
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:372)
>
> at
> org.apache.hadoop.hive.ql.io.HiveRecordReader.doNext(HiveRecordReader.java:79)
>
> at
> org.apache.hadoop.hive.ql.io.HiveRecordReader.doNext(HiveRecordReader.java:33)
>
> at
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.next(HiveContextAwareRecordReader.java:118)
>
> at
> org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.next(TezGroupedSplitsInputFormat.java:137)
>
> at org.apache.tez.mapreduce.lib.MRReaderMapred.next(MRReaderMapred.java:113)
> at
> org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:68)
>
> ... 16 more
> *Another error for other query:*
> DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:4
> FAILED: Execution Error, return code 2 from
> org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, vertexName=Map 1,
> vertexId=vertex_1470240409111_2339_1_06, diagnostics=[Vertex
> vertex_1470240409111_2339_1_06 [Map 1] killed/failed due
> to:ROOT_INPUT_INIT_FAILURE, Vertex Input: app_sessions initializer failed,
> vertex=vertex_1470240409111_2339_1_06 [Map 1], java.io.FileNotFoundException:
> No such file or directory:
> s3n://<bucket>/warehouse/<table>/year=2015/month=02/day=14/hour=03
> at
> org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:1078)
> at org.apache.hadoop.fs.s3a.S3AFileSystem.listStatus(S3AFileSystem.java:783)
> at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1500)
> at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1540)
> at org.apache.hadoop.fs.FileSystem$4.(FileSystem.java:1704)
> at org.apache.hadoop.fs.FileSystem.listLocatedStatus(FileSystem.java:1703)
> at
> org.apache.hadoop.mapred.InputPathProcessor.perPathComputation(InputPathProcessor.java:235)
>
> at
> org.apache.hadoop.mapred.InputPathProcessor.access$000(InputPathProcessor.java:28)
>
> at
> org.apache.hadoop.mapred.InputPathProcessor$2.run(InputPathProcessor.java:338)
>
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>
> at java.lang.Thread.run(Thread.java:745)
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)