[
https://issues.apache.org/jira/browse/TEZ-3685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jonathan Eagles updated TEZ-3685:
---------------------------------
Description:
Per comment in TEZ-3334
https://issues.apache.org/jira/browse/TEZ-3334?focusedCommentId=15950550&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15950550
Addressing The off by 1 issue / other correctness issues in this jira
- ShuffleManager: has moved to using a BitSet instead of a ConcurrentSet. May
need to look at the synchronization points required for the bitset. (access to
the map had some synchronized and some non synchronized access)
- Why was this removed?
{code}
- // Sanity check
- // we are guaranteed that key is not null
- if (srcAttemptsRemaining.get(srcAttemptId.toString()) == null) {
- // wrongMapErrs.increment(1);
- LOG.warn("Invalid input. Received output for headerPathComponent: "
- + pathComponent + "nextRemainingSrcAttemptId: "
- + getNextRemainingAttempt() + ", mappedSrcAttemptId: " +
srcAttemptId);
- return false;
- }
{code}
-
Is there an off by 1 error here? Specifically. input.getInputIdentifier() +
((CompositeInputAttemptIdentifier) input).getInputIdentifierCount(). Think
getInputIdentifierCount is always at least 1.
{code}
+ if(input instanceof CompositeInputAttemptIdentifier) {
+ // LLL Is there an off by 0 case here. i.e. getInputIdentiferCount
will be set to 1 for a regular event.
+ // We start scanning the bitst early. Also depends on how
completedInputSet is setup.
+ if (completedInputSet.nextClearBit(input.getInputIdentifier()) >=
+ input.getInputIdentifier() + ((CompositeInputAttemptIdentifier)
input).getInputIdentifierCount()) {
+ inputIter.remove();
+ continue;
+ }
+ } else {
+ if (completedInputSet.get(input.getInputIdentifier())) {
+ inputIter.remove();
+ continue;
+ }
{code}
was:
Per comment in TEZ-3334
https://issues.apache.org/jira/browse/TEZ-3334?focusedCommentId=15950550&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15950550
Addressing The off by 1 issue / other correctness issues in this jira
- ShuffleManager: has moved to using a BitSet instead of a ConcurrentSet. May
need to look at the synchronization points required for the bitset. (access to
the map had some synchronized and some non synchronized access)
- In optimizeLocalFetch - report success only after an entire source is done
(i.e. all partitions for it). Otherwise the same input may be reported multiple
times, which could be problematic.
- Why was this removed?
{code}
- // Sanity check
- // we are guaranteed that key is not null
- if (srcAttemptsRemaining.get(srcAttemptId.toString()) == null) {
- // wrongMapErrs.increment(1);
- LOG.warn("Invalid input. Received output for headerPathComponent: "
- + pathComponent + "nextRemainingSrcAttemptId: "
- + getNextRemainingAttempt() + ", mappedSrcAttemptId: " +
srcAttemptId);
- return false;
- }
{code}
-
Is there an off by 1 error here? Specifically. input.getInputIdentifier() +
((CompositeInputAttemptIdentifier) input).getInputIdentifierCount(). Think
getInputIdentifierCount is always at least 1.
{code}
+ if(input instanceof CompositeInputAttemptIdentifier) {
+ // LLL Is there an off by 0 case here. i.e. getInputIdentiferCount
will be set to 1 for a regular event.
+ // We start scanning the bitst early. Also depends on how
completedInputSet is setup.
+ if (completedInputSet.nextClearBit(input.getInputIdentifier()) >=
+ input.getInputIdentifier() + ((CompositeInputAttemptIdentifier)
input).getInputIdentifierCount()) {
+ inputIter.remove();
+ continue;
+ }
+ } else {
+ if (completedInputSet.get(input.getInputIdentifier())) {
+ inputIter.remove();
+ continue;
+ }
{code}
> ShuffleHandler completedInputSet off-by-one error
> -------------------------------------------------
>
> Key: TEZ-3685
> URL: https://issues.apache.org/jira/browse/TEZ-3685
> Project: Apache Tez
> Issue Type: Sub-task
> Reporter: Jonathan Eagles
> Assignee: Jonathan Eagles
>
> Per comment in TEZ-3334
> https://issues.apache.org/jira/browse/TEZ-3334?focusedCommentId=15950550&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15950550
> Addressing The off by 1 issue / other correctness issues in this jira
> - ShuffleManager: has moved to using a BitSet instead of a ConcurrentSet. May
> need to look at the synchronization points required for the bitset. (access
> to the map had some synchronized and some non synchronized access)
> - Why was this removed?
> {code}
> - // Sanity check
> - // we are guaranteed that key is not null
> - if (srcAttemptsRemaining.get(srcAttemptId.toString()) == null) {
> - // wrongMapErrs.increment(1);
> - LOG.warn("Invalid input. Received output for headerPathComponent: "
> - + pathComponent + "nextRemainingSrcAttemptId: "
> - + getNextRemainingAttempt() + ", mappedSrcAttemptId: " +
> srcAttemptId);
> - return false;
> - }
> {code}
> -
> Is there an off by 1 error here? Specifically. input.getInputIdentifier() +
> ((CompositeInputAttemptIdentifier) input).getInputIdentifierCount(). Think
> getInputIdentifierCount is always at least 1.
> {code}
> + if(input instanceof CompositeInputAttemptIdentifier) {
> + // LLL Is there an off by 0 case here. i.e. getInputIdentiferCount
> will be set to 1 for a regular event.
> + // We start scanning the bitst early. Also depends on how
> completedInputSet is setup.
> + if (completedInputSet.nextClearBit(input.getInputIdentifier()) >=
> + input.getInputIdentifier() + ((CompositeInputAttemptIdentifier)
> input).getInputIdentifierCount()) {
> + inputIter.remove();
> + continue;
> + }
> + } else {
> + if (completedInputSet.get(input.getInputIdentifier())) {
> + inputIter.remove();
> + continue;
> + }
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)