[ https://issues.apache.org/jira/browse/PIG-5371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Laszlo Bodor updated PIG-5371: ------------------------------ Description: Attached [^simpleTest.out]. It seems like HDFS counter 'HDFS_BYTES_WRITTEN' returns the byte count not only for the result of pig store operator, but it includes the size of the jar files as well. The problem is this could change very easily, so in my opinion the best would be to remove these assertions from TestPigRunner as this is just causing intermittent and/or persistent failures. The test class is for basic testing of PigRunner, and this is achieved well enough without the asserts. {code} 2018-11-23 10:14:52,661 [IPC Server handler 5 on 54929] INFO org.apache.hadoop.hdfs.StateChange - BLOCK* allocate blk_1073741827_1003, replicas=127.0.0.1:54934, 127.0.0.1:54930, 127.0.0.1:54943 for /tmp/temp-157262781/tmp-1057655772/automaton-1.11-8.jar ... 2018-11-23 10:14:52,735 [PacketResponder: BP-26001448-10.200.50.195-1542964474138:blk_1073741827_1003, type=HAS_DOWNSTREAM_IN_PIPELINE, downstreams=2:[127.0.0.1:54930, 127.0.0.1:54943]] INFO org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace - src: /127.0.0.1:54978, dest: /127.0.0.1:54934, bytes: 176285, op: HDFS_WRITE, cliID: DFSClient_NONMAPREDUCE_-1959727442_1, offset: 0, srvID: 108c4000-1ae0-402e-82cf-bf403629c0f7, blockid: BP-26001448-10.200.50.195-1542964474138:blk_1073741827_1003, duration(ns): 57162859 {code} was: Attached [^simpleTest.out]. It seems like HDFS counter 'HDFS_BYTES_WRITTEN' returns the byte count not only for the result of pig store operator, but it includes the jar files as well. The problem is this could change very easily, so in my opinion the best would be to remove these assertions from TestPigRunner as this is just causing intermittent and/or persistent failures. The test class is for basic testing of PigRunner, and this is achieved well enough without the asserts. {code} 2018-11-23 10:14:52,661 [IPC Server handler 5 on 54929] INFO org.apache.hadoop.hdfs.StateChange - BLOCK* allocate blk_1073741827_1003, replicas=127.0.0.1:54934, 127.0.0.1:54930, 127.0.0.1:54943 for /tmp/temp-157262781/tmp-1057655772/automaton-1.11-8.jar ... 2018-11-23 10:14:52,735 [PacketResponder: BP-26001448-10.200.50.195-1542964474138:blk_1073741827_1003, type=HAS_DOWNSTREAM_IN_PIPELINE, downstreams=2:[127.0.0.1:54930, 127.0.0.1:54943]] INFO org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace - src: /127.0.0.1:54978, dest: /127.0.0.1:54934, bytes: 176285, op: HDFS_WRITE, cliID: DFSClient_NONMAPREDUCE_-1959727442_1, offset: 0, srvID: 108c4000-1ae0-402e-82cf-bf403629c0f7, blockid: BP-26001448-10.200.50.195-1542964474138:blk_1073741827_1003, duration(ns): 57162859 {code} > Hdfs bytes written assertions fail in TestPigRunner > --------------------------------------------------- > > Key: PIG-5371 > URL: https://issues.apache.org/jira/browse/PIG-5371 > Project: Pig > Issue Type: Bug > Reporter: Laszlo Bodor > Assignee: Laszlo Bodor > Priority: Major > Attachments: PIG-5371.01.patch, simpleTest.out > > > Attached [^simpleTest.out]. It seems like HDFS counter 'HDFS_BYTES_WRITTEN' > returns the byte count not only for the result of pig store operator, but it > includes the size of the jar files as well. The problem is this could change > very easily, so in my opinion the best would be to remove these assertions > from TestPigRunner as this is just causing intermittent and/or persistent > failures. > The test class is for basic testing of PigRunner, and this is achieved well > enough without the asserts. > {code} > 2018-11-23 10:14:52,661 [IPC Server handler 5 on 54929] INFO > org.apache.hadoop.hdfs.StateChange - BLOCK* allocate blk_1073741827_1003, > replicas=127.0.0.1:54934, 127.0.0.1:54930, 127.0.0.1:54943 for > /tmp/temp-157262781/tmp-1057655772/automaton-1.11-8.jar > ... > 2018-11-23 10:14:52,735 [PacketResponder: > BP-26001448-10.200.50.195-1542964474138:blk_1073741827_1003, > type=HAS_DOWNSTREAM_IN_PIPELINE, downstreams=2:[127.0.0.1:54930, > 127.0.0.1:54943]] INFO > org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace - src: > /127.0.0.1:54978, dest: /127.0.0.1:54934, bytes: 176285, op: HDFS_WRITE, > cliID: DFSClient_NONMAPREDUCE_-1959727442_1, offset: 0, srvID: > 108c4000-1ae0-402e-82cf-bf403629c0f7, blockid: > BP-26001448-10.200.50.195-1542964474138:blk_1073741827_1003, duration(ns): > 57162859 > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)