[ https://issues.apache.org/jira/browse/PIG-2553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13496011#comment-13496011 ]
Prashant Kommireddi commented on PIG-2553: ------------------------------------------ I feel like we could treat this as file-based vs non-file based storage locations, similar to PIG-2924. The patch there uses a default FileBasedOutputSizeReader to determine output size and "pig.stats.output.size.reader" to compute size based on a different implementation. For this JIRA, can we also use a similar idea and handle file-based schemes with UriUtil.isHDFSFileOrLocalOrS3N(String uri)? For all other schemes (hbase, hcat, ...) we can allow multiple relations writing to same location. 1. Check if pig.location.check.strict is set 2. If not set, just log a warning if scheme is file-based 3. If set, check if scheme is file-based and report an error 4. If set but not a file-based scheme, continue without any warning/error message Thoughts? > Pig shouldn't allow attempts to write multiple relations into same directory > ---------------------------------------------------------------------------- > > Key: PIG-2553 > URL: https://issues.apache.org/jira/browse/PIG-2553 > Project: Pig > Issue Type: Improvement > Reporter: Dmitriy V. Ryaboy > Assignee: Prashant Kommireddi > Attachments: PIG-2553.patch > > > We've seen multiple occasions where users accidentally try to store 2 or more > different relations to the same destination directory. Currently, this passes > the Pig planner and fails on MR side due to concurrent attempts to create the > same part file on the reducer. This is extremely confusing to the user, and > hard to debug. > We should instead fail their scripts before they are even submitted, since we > can identify the erroneous condition from the beginning. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira