[ https://issues.apache.org/jira/browse/CRUNCH-636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15903516#comment-15903516 ]
Attila Sasvari commented on CRUNCH-636: --------------------------------------- [CRUNCH-636.01.patch|https://issues.apache.org/jira/secure/attachment/12857031/CRUNCH-636.01.patch] breaks integration tests. Unit tests pass. I will upload a new patch as soon as possible. > Make replication factor for temporary files configurable > -------------------------------------------------------- > > Key: CRUNCH-636 > URL: https://issues.apache.org/jira/browse/CRUNCH-636 > Project: Crunch > Issue Type: New Feature > Reporter: Attila Sasvari > Assignee: Attila Sasvari > Attachments: CRUNCH-636.01.patch, > test.WordCount_2017-03-08_16.31.55.737_jobplan.dot.png, > test.WordCount_2017-03-08_16.31.55.737.log > > > As of now, Crunch does not allow having different replication factor for > temporary files and non-temporary files (e.g. final output data of leaf > nodes) at the same time. If a user has a large amount of data (say hundreds a > of gigabytes) to process, they might want to have lower replication factor > for large temporary files between Crunch jobs. > We could make this configurable via a new setting (e.g. > {{crunch.tmp.dir.replication}}). -- This message was sent by Atlassian JIRA (v6.3.15#6346)