[ https://issues.apache.org/jira/browse/CRUNCH-494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14297726#comment-14297726 ]
Surbhi Mungre commented on CRUNCH-494: -------------------------------------- I am using Crunch 0.8.3, however the test fails even with latest version of Crunch with same exception but in a different method. It looks like recursion is used at several places when trying to union PCollections therefore limiting the number of PCollections which could be unioned. {noformat} java.lang.StackOverflowError at java.util.HashMap$EntryIterator.<init>(HashMap.java:832) at java.util.HashMap$EntryIterator.<init>(HashMap.java:832) at java.util.HashMap.newEntryIterator(HashMap.java:846) at java.util.HashMap$EntrySet.iterator(HashMap.java:950) at java.util.AbstractMap.hashCode(AbstractMap.java:459) at org.apache.commons.lang.builder.HashCodeBuilder.append(HashCodeBuilder.java:881) at org.apache.crunch.io.FormatBundle.hashCode(FormatBundle.java:119) at org.apache.commons.lang.builder.HashCodeBuilder.append(HashCodeBuilder.java:881) at org.apache.crunch.io.impl.FileSourceImpl.hashCode(FileSourceImpl.java:173) at java.util.HashMap.getEntry(HashMap.java:344) at java.util.HashMap.containsKey(HashMap.java:335) at java.util.HashSet.contains(HashSet.java:184) at org.apache.crunch.impl.dist.collect.BaseInputCollection.waitingOnTargets(BaseInputCollection.java:70) at org.apache.crunch.impl.dist.collect.PCollectionImpl.waitingOnTargets(PCollectionImpl.java:217) at org.apache.crunch.impl.dist.collect.PCollectionImpl.waitingOnTargets(PCollectionImpl.java:217) at org.apache.crunch.impl.dist.collect.PCollectionImpl.waitingOnTargets(PCollectionImpl.java:217) at org.apache.crunch.impl.dist.collect.PCollectionImpl.waitingOnTargets(PCollectionImpl.java:217) at org.apache.crunch.impl.dist.collect.PCollectionImpl.waitingOnTargets(PCollectionImpl.java:217) at org.apache.crunch.impl.dist.collect.PCollectionImpl.waitingOnTargets(PCollectionImpl.java:217) {noformat} > Unable to union large number of PCollections > --------------------------------------------- > > Key: CRUNCH-494 > URL: https://issues.apache.org/jira/browse/CRUNCH-494 > Project: Crunch > Issue Type: Bug > Components: Core > Affects Versions: 0.8.3 > Reporter: Surbhi Mungre > Assignee: Josh Wills > Priority: Minor > > If you try to union large number of PCollections(~5K), then Crunch throws > StackOverflowError exception. > {noformat} > java.lang.StackOverflowError > at > com.google.common.collect.AbstractIndexedListIterator.<init>(AbstractIndexedListIterator.java:68) > at > com.google.common.collect.AbstractIndexedListIterator.<init>(AbstractIndexedListIterator.java:54) > at com.google.common.collect.Iterators$12.<init>(Iterators.java:1072) > at com.google.common.collect.Iterators.forArray(Iterators.java:1072) > at > com.google.common.collect.RegularImmutableList.iterator(RegularImmutableList.java:68) > at > com.google.common.collect.RegularImmutableList.iterator(RegularImmutableList.java:31) > at > org.apache.crunch.impl.dist.collect.PCollectionImpl.getTargetDependencies(PCollectionImpl.java:291) > at > org.apache.crunch.impl.dist.collect.PCollectionImpl.getTargetDependencies(PCollectionImpl.java:292) > at > org.apache.crunch.impl.dist.collect.PCollectionImpl.getTargetDependencies(PCollectionImpl.java:292) > at > org.apache.crunch.impl.dist.collect.PCollectionImpl.getTargetDependencies(PCollectionImpl.java:292) > at > org.apache.crunch.impl.dist.collect.PCollectionImpl.getTargetDependencies(PCollectionImpl.java:292) > at > org.apache.crunch.impl.dist.collect.PCollectionImpl.getTargetDependencies(PCollectionImpl.java:292) > at > org.apache.crunch.impl.dist.collect.PCollectionImpl.getTargetDependencies(PCollectionImpl.java:292) > at > org.apache.crunch.impl.dist.collect.PCollectionImpl.getTargetDependencies(PCollectionImpl.java:292) > at > org.apache.crunch.impl.dist.collect.PCollectionImpl.getTargetDependencies(PCollectionImpl.java:292) > at > org.apache.crunch.impl.dist.collect.PCollectionImpl.getTargetDependencies(PCollectionImpl.java:292) > at > org.apache.crunch.impl.dist.collect.PCollectionImpl.getTargetDependencies(PCollectionImpl.java:292) > at > org.apache.crunch.impl.dist.collect.PCollectionImpl.getTargetDependencies(PCollectionImpl.java:292) > at > org.apache.crunch.impl.dist.collect.PCollectionImpl.getTargetDependencies(PCollectionImpl.java:292) > at > org.apache.crunch.impl.dist.collect.PCollectionImpl.getTargetDependencies(PCollectionImpl.java:292) > at > org.apache.crunch.impl.dist.collect.PCollectionImpl.getTargetDependencies(PCollectionImpl.java:292) > at > org.apache.crunch.impl.dist.collect.PCollectionImpl.getTargetDependencies(PCollectionImpl.java:292) > at > org.apache.crunch.impl.dist.collect.PCollectionImpl.getTargetDependencies(PCollectionImpl.java:292) > at > org.apache.crunch.impl.dist.collect.PCollectionImpl.getTargetDependencies(PCollectionImpl.java:292) > at > org.apache.crunch.impl.dist.collect.PCollectionImpl.getTargetDependencies(PCollectionImpl.java:292) > at > org.apache.crunch.impl.dist.collect.PCollectionImpl.getTargetDependencies(PCollectionImpl.java:292) > at > org.apache.crunch.impl.dist.collect.PCollectionImpl.getTargetDependencies(PCollectionImpl.java:292) > at > org.apache.crunch.impl.dist.collect.PCollectionImpl.getTargetDependencies(PCollectionImpl.java:292) > at > org.apache.crunch.impl.dist.collect.PCollectionImpl.getTargetDependencies(PCollectionImpl.java:292) > at > org.apache.crunch.impl.dist.collect.PCollectionImpl.getTargetDependencies(PCollectionImpl.java:292) > at > org.apache.crunch.impl.dist.collect.PCollectionImpl.getTargetDependencies(PCollectionImpl.java:292) > at > org.apache.crunch.impl.dist.collect.PCollectionImpl.getTargetDependencies(PCollectionImpl.java:292) > {noformat} > Here is a simple test which can reproduce the issue. > https://gist.github.com/anonymous/22f08511604341d0ffda -- This message was sent by Atlassian JIRA (v6.3.4#6332)