[
https://issues.apache.org/jira/browse/PIG-4443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14598365#comment-14598365
]
Rohini Palaniswamy commented on PIG-4443:
-----------------------------------------
That is odd. This patch is supposed to fix the exact same issue and we have
been running fine with this patch for months now. Do you see the below message
from this patch in your logs?
{code}
log.info("Writing input splits to " + inputSplitsDir
+ " for vertex " + vertex.getName()
+ " as the serialized size in memory is "
+ splitsSerializedSize + ". Configured "
+
PigConfiguration.PIG_TEZ_INPUT_SPLITS_MEM_THRESHOLD
+ " is " + spillThreshold);
{code}
If not, I suspect that you still have the old pig.jar in your classpath and it
is not Pig 0.15 that is running.
bq. I've built this version from its sources, packaged it and added its
dependencies to an unique zip file and uploaded to my HDFS.
This comment also seems to indicate that you have replaced pig jar in hdfs.
Not sure why you need pig in HDFS. You need to replace the
pig-0.15.0-core-h2.jar in the pig client installation from the node you are
running the script.
> Write inputsplits in Tez to disk if the size is huge and option to compress
> pig input splits
> --------------------------------------------------------------------------------------------
>
> Key: PIG-4443
> URL: https://issues.apache.org/jira/browse/PIG-4443
> Project: Pig
> Issue Type: Bug
> Affects Versions: 0.14.0
> Reporter: Rohini Palaniswamy
> Assignee: Rohini Palaniswamy
> Fix For: 0.15.0
>
> Attachments: PIG-4443-1.patch, PIG-4443-Fix-TEZ-2192-2.patch,
> PIG-4443-Fix-TEZ-2192.patch
>
>
> Pig sets the input split information in user payload and when running against
> a table with 10s of 1000s of partitions, DAG submission fails with
> java.io.IOException: Requested data length 305844060 is longer than maximum
> configured RPC length 67108864
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)