[ 
https://issues.apache.org/jira/browse/PIG-3231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tobias Schlottke updated PIG-3231:
----------------------------------

    Description: 
Hi there,

we've got a strange issue after switching to a new cluster with cdh4.2 (from 
cdh3):
Pig seems to create temporary avro files for its map reduce jobs, which it 
either deletes or never creates.

Pig fails with the "no error returned by hadoop"-message, but in nn-logs I 
found something interesting.
The actual exception from nn-log is:
a
{code}
2013-03-01 12:59:30,858 INFO org.apache.hadoop.ipc.Server: IPC Server handler 0 
on 8020, call org.apache.hadoop.hdfs.protocol.ClientProtocol.addBlock from 
192.168.1.28:37814: error: 
org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: No lease on 
/user/metrigo/event_logger/compact_log/2013/01/14/_temporary/1/_temporary/attempt_1362133122980_0017_m_000007_0/part-m-00007.avro
 File does not exist. Holder 
DFSClient_attempt_1362133122980_0017_m_000007_0_1992466008_1 does not have any 
open files.
org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: No lease on 
/user/metrigo/event_logger/compact_log/2013/01/14/_temporary/1/_temporary/attempt_1362133122980_0017_m_000007_0/part-m-00007.avro
 File does not exist. Holder 
DFSClient_attempt_1362133122980_0017_m_000007_0_1992466008_1 does not have any 
open files.
        at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2396)
        at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2387)
        at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2183)
        at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:481)
        at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:297)
        at 
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:44080)
        at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1002)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1695)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1691)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:416)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1689)
{code}


Please note that we're analyzing a bunch of files (~200 files, we're using glob 
matchers), some of them are small.
We made it work once without the small files.

*Update*
I found the following exception deep in the logs that seems to make the job 
fail:

{code}
2013-03-03 19:51:06,169 ERROR [main] 
org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException 
as:metrigo (auth:SIMPLE) cause:java.io.IOException: 
org.apache.avro.AvroRuntimeException: java.io.IOException: Filesystem closed
2013-03-03 19:51:06,170 WARN [main] org.apache.hadoop.mapred.YarnChild: 
Exception running child : java.io.IOException: 
org.apache.avro.AvroRuntimeException: java.io.IOException: Filesystem closed
        at 
org.apache.pig.piggybank.storage.avro.AvroStorage.getNext(AvroStorage.java:357)
        at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:211)
        at 
org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:526)
        at 
org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80)
        at 
org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91)
        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:756)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:338)
        at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:157)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:416)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
        at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:152)
Caused by: org.apache.avro.AvroRuntimeException: java.io.IOException: 
Filesystem closed
        at 
org.apache.avro.file.DataFileStream.hasNextBlock(DataFileStream.java:275)
        at org.apache.avro.file.DataFileStream.hasNext(DataFileStream.java:197)
        at 
org.apache.pig.piggybank.storage.avro.PigAvroRecordReader.nextKeyValue(PigAvroRecordReader.java:180)
        at 
org.apache.pig.piggybank.storage.avro.AvroStorage.getNext(AvroStorage.java:352)
        ... 12 more
Caused by: java.io.IOException: Filesystem closed
        at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:552)
        at 
org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:648)
        at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:706)
        at java.io.DataInputStream.read(DataInputStream.java:149)
        at 
org.apache.pig.piggybank.storage.avro.AvroStorageInputStream.read(AvroStorageInputStream.java:43)
        at 
org.apache.avro.file.DataFileReader$SeekableInputStream.read(DataFileReader.java:210)
        at 
org.apache.avro.io.BinaryDecoder$InputStreamByteSource.tryReadRaw(BinaryDecoder.java:835)
        at org.apache.avro.io.BinaryDecoder.isEnd(BinaryDecoder.java:440)
        at 
org.apache.avro.file.DataFileStream.hasNextBlock(DataFileStream.java:261)
        ... 15 more
{code}

Any Idea on how to find the reason for this?

Best,

Tobias





  was:
Hi there,

we've got a strange issue after switching to a new cluster with cdh4.2 (from 
cdh3):
Pig seems to create temporary avro files for its map reduce jobs, which it 
either deletes or never creates.

Pig fails with the "no error returned by hadoop"-message, but in nn-logs I 
found something interesting.
The actual exception from nn-log is:
a
{code}
2013-03-01 12:59:30,858 INFO org.apache.hadoop.ipc.Server: IPC Server handler 0 
on 8020, call org.apache.hadoop.hdfs.protocol.ClientProtocol.addBlock from 
192.168.1.28:37814: error: 
org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: No lease on 
/user/metrigo/event_logger/compact_log/2013/01/14/_temporary/1/_temporary/attempt_1362133122980_0017_m_000007_0/part-m-00007.avro
 File does not exist. Holder 
DFSClient_attempt_1362133122980_0017_m_000007_0_1992466008_1 does not have any 
open files.
org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: No lease on 
/user/metrigo/event_logger/compact_log/2013/01/14/_temporary/1/_temporary/attempt_1362133122980_0017_m_000007_0/part-m-00007.avro
 File does not exist. Holder 
DFSClient_attempt_1362133122980_0017_m_000007_0_1992466008_1 does not have any 
open files.
        at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2396)
        at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2387)
        at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2183)
        at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:481)
        at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:297)
        at 
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:44080)
        at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1002)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1695)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1691)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:416)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1689)
{code}


Please note that we're analyzing a bunch of files (~200 files, we're using glob 
matchers), some of them are small.
We made it work once without the small files.
Any Idea on how to find the reason for this?

Best,

Tobias


    
> Problems with pig (TRUNK, 0.11) after upgrading to CDH4.2(yarn) using avro 
> input
> --------------------------------------------------------------------------------
>
>                 Key: PIG-3231
>                 URL: https://issues.apache.org/jira/browse/PIG-3231
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.11
>         Environment: CDH4.2, yarn, avro
>            Reporter: Tobias Schlottke
>
> Hi there,
> we've got a strange issue after switching to a new cluster with cdh4.2 (from 
> cdh3):
> Pig seems to create temporary avro files for its map reduce jobs, which it 
> either deletes or never creates.
> Pig fails with the "no error returned by hadoop"-message, but in nn-logs I 
> found something interesting.
> The actual exception from nn-log is:
> a
> {code}
> 2013-03-01 12:59:30,858 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
> 0 on 8020, call org.apache.hadoop.hdfs.protocol.ClientProtocol.addBlock from 
> 192.168.1.28:37814: error: 
> org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: No lease on 
> /user/metrigo/event_logger/compact_log/2013/01/14/_temporary/1/_temporary/attempt_1362133122980_0017_m_000007_0/part-m-00007.avro
>  File does not exist. Holder 
> DFSClient_attempt_1362133122980_0017_m_000007_0_1992466008_1 does not have 
> any open files.
> org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: No lease on 
> /user/metrigo/event_logger/compact_log/2013/01/14/_temporary/1/_temporary/attempt_1362133122980_0017_m_000007_0/part-m-00007.avro
>  File does not exist. Holder 
> DFSClient_attempt_1362133122980_0017_m_000007_0_1992466008_1 does not have 
> any open files.
>         at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2396)
>         at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2387)
>         at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2183)
>         at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:481)
>         at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:297)
>         at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:44080)
>         at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1002)
>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1695)
>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1691)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:416)
>         at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1689)
> {code}
> Please note that we're analyzing a bunch of files (~200 files, we're using 
> glob matchers), some of them are small.
> We made it work once without the small files.
> *Update*
> I found the following exception deep in the logs that seems to make the job 
> fail:
> {code}
> 2013-03-03 19:51:06,169 ERROR [main] 
> org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException 
> as:metrigo (auth:SIMPLE) cause:java.io.IOException: 
> org.apache.avro.AvroRuntimeException: java.io.IOException: Filesystem closed
> 2013-03-03 19:51:06,170 WARN [main] org.apache.hadoop.mapred.YarnChild: 
> Exception running child : java.io.IOException: 
> org.apache.avro.AvroRuntimeException: java.io.IOException: Filesystem closed
>         at 
> org.apache.pig.piggybank.storage.avro.AvroStorage.getNext(AvroStorage.java:357)
>         at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:211)
>         at 
> org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:526)
>         at 
> org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80)
>         at 
> org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91)
>         at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143)
>         at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:756)
>         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:338)
>         at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:157)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:416)
>         at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
>         at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:152)
> Caused by: org.apache.avro.AvroRuntimeException: java.io.IOException: 
> Filesystem closed
>         at 
> org.apache.avro.file.DataFileStream.hasNextBlock(DataFileStream.java:275)
>         at 
> org.apache.avro.file.DataFileStream.hasNext(DataFileStream.java:197)
>         at 
> org.apache.pig.piggybank.storage.avro.PigAvroRecordReader.nextKeyValue(PigAvroRecordReader.java:180)
>         at 
> org.apache.pig.piggybank.storage.avro.AvroStorage.getNext(AvroStorage.java:352)
>         ... 12 more
> Caused by: java.io.IOException: Filesystem closed
>         at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:552)
>         at 
> org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:648)
>         at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:706)
>         at java.io.DataInputStream.read(DataInputStream.java:149)
>         at 
> org.apache.pig.piggybank.storage.avro.AvroStorageInputStream.read(AvroStorageInputStream.java:43)
>         at 
> org.apache.avro.file.DataFileReader$SeekableInputStream.read(DataFileReader.java:210)
>         at 
> org.apache.avro.io.BinaryDecoder$InputStreamByteSource.tryReadRaw(BinaryDecoder.java:835)
>         at org.apache.avro.io.BinaryDecoder.isEnd(BinaryDecoder.java:440)
>         at 
> org.apache.avro.file.DataFileStream.hasNextBlock(DataFileStream.java:261)
>         ... 15 more
> {code}
> Any Idea on how to find the reason for this?
> Best,
> Tobias

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to