[ 
https://issues.apache.org/jira/browse/PIG-4443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14599061#comment-14599061
 ] 

Ángel Álvarez commented on PIG-4443:
------------------------------------

I'm using Pig 15 ... or so it seems ...

   INFO  org.apache.pig.Main - Apache Pig version 0.15.0-SNAPSHOT (r: unknown) 
compiled Jun 18 2015, 15:39:42

but I don't see that message because in my case the previous if condition is 
false (splitsSerializedSize=1163342 , spillThreshold=33554432)

{code:java}
                if(splitsSerializedSize > spillThreshold) {
                    ...
                    log.info("Writing input splits to " + inputSplitsDir
                            + " for vertex " + vertex.getName()
                            + " as the serialized size in memory is "
                            + splitsSerializedSize + ". Configured "
                            + 
PigConfiguration.PIG_TEZ_INPUT_SPLITS_MEM_THRESHOLD
                            + " is " + spillThreshold);
                    ...                       
                } else {
                    // Send splits via RPC to AM
                    userPayLoadBuilder.setSplits(splitsProto);
                }
{code}

I don't know if it's relevant, but comparing the differences in the AM syslog 
file between Pig 14 and Pig 15, I found this message only while executing Pig 
15:

   INFO  org.apache.tez.client.TezClient - Using 
org.apache.tez.dag.history.ats.acls.ATSHistoryACLPolicyManager to manage 
Timeline ACLs

Do you have any test I could probe in my environment?

On the other hand, when I said I uploaded an unique zip to my HDFS, I meant 
only the TEZ 0.7.0 libraries and its dependencies 
(-Dtez.lib.uris=/hdp/apps/2.2.0.0-2041/tez-0.15.0/tez.tar.gz). I have to add 
this argument to my PIG_OPTS in order to overwrite my 
/etc/tez/conf/tez-site.xml settings (this file is managed by HDP and, by 
default, it's pointing to the TEZ 0.5.2 shipped with HDP 2.2.0.0.2041).

> Write inputsplits in Tez to disk if the size is huge and option to compress 
> pig input splits
> --------------------------------------------------------------------------------------------
>
>                 Key: PIG-4443
>                 URL: https://issues.apache.org/jira/browse/PIG-4443
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.14.0
>            Reporter: Rohini Palaniswamy
>            Assignee: Rohini Palaniswamy
>             Fix For: 0.15.0
>
>         Attachments: PIG-4443-1.patch, PIG-4443-Fix-TEZ-2192-2.patch, 
> PIG-4443-Fix-TEZ-2192.patch
>
>
> Pig sets the input split information in user payload and when running against 
> a table with 10s of 1000s of partitions, DAG submission fails with
> java.io.IOException: Requested data length 305844060 is longer than maximum
> configured RPC length 67108864



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to