[
https://issues.apache.org/jira/browse/PIG-4443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14599061#comment-14599061
]
Ángel Álvarez commented on PIG-4443:
------------------------------------
I'm using Pig 15 ... or so it seems ...
INFO org.apache.pig.Main - Apache Pig version 0.15.0-SNAPSHOT (r: unknown)
compiled Jun 18 2015, 15:39:42
but I don't see that message because in my case the previous if condition is
false (splitsSerializedSize=1163342 , spillThreshold=33554432)
{code:java}
if(splitsSerializedSize > spillThreshold) {
...
log.info("Writing input splits to " + inputSplitsDir
+ " for vertex " + vertex.getName()
+ " as the serialized size in memory is "
+ splitsSerializedSize + ". Configured "
+
PigConfiguration.PIG_TEZ_INPUT_SPLITS_MEM_THRESHOLD
+ " is " + spillThreshold);
...
} else {
// Send splits via RPC to AM
userPayLoadBuilder.setSplits(splitsProto);
}
{code}
I don't know if it's relevant, but comparing the differences in the AM syslog
file between Pig 14 and Pig 15, I found this message only while executing Pig
15:
INFO org.apache.tez.client.TezClient - Using
org.apache.tez.dag.history.ats.acls.ATSHistoryACLPolicyManager to manage
Timeline ACLs
Do you have any test I could probe in my environment?
On the other hand, when I said I uploaded an unique zip to my HDFS, I meant
only the TEZ 0.7.0 libraries and its dependencies
(-Dtez.lib.uris=/hdp/apps/2.2.0.0-2041/tez-0.15.0/tez.tar.gz). I have to add
this argument to my PIG_OPTS in order to overwrite my
/etc/tez/conf/tez-site.xml settings (this file is managed by HDP and, by
default, it's pointing to the TEZ 0.5.2 shipped with HDP 2.2.0.0.2041).
> Write inputsplits in Tez to disk if the size is huge and option to compress
> pig input splits
> --------------------------------------------------------------------------------------------
>
> Key: PIG-4443
> URL: https://issues.apache.org/jira/browse/PIG-4443
> Project: Pig
> Issue Type: Bug
> Affects Versions: 0.14.0
> Reporter: Rohini Palaniswamy
> Assignee: Rohini Palaniswamy
> Fix For: 0.15.0
>
> Attachments: PIG-4443-1.patch, PIG-4443-Fix-TEZ-2192-2.patch,
> PIG-4443-Fix-TEZ-2192.patch
>
>
> Pig sets the input split information in user payload and when running against
> a table with 10s of 1000s of partitions, DAG submission fails with
> java.io.IOException: Requested data length 305844060 is longer than maximum
> configured RPC length 67108864
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)