[jira] [Updated] (HDFS-11334) Handle the case when NN switch and rescheduling movements can lead to have more than one coordinator for same file block

Uma Maheswara Rao G (JIRA) Wed, 11 Jan 2017 14:15:12 -0800

     [ 
https://issues.apache.org/jira/browse/HDFS-11334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Uma Maheswara Rao G updated HDFS-11334:
---------------------------------------
    Description: 
I am summarizing the scenarios here what Rakesh and me discussed offline:
Here we need to handle couple of cases:
# NN switch - it will freshly start scheduling for all files.
       At this time, old co-ordinators may continue movement work and send 
results back. This could confuse NN SPS that which result is right one.
  *NEED TO HANDLE*
# DN disconnected for heartbeat expiry - If DN disconnected for long time(more 
than heartbeat expiry), NN will remove this nodes. After SPS Monitor time out, 
it may retry for files which were scheduled to that DN, by finding new 
co-ordinator. But if it reconnects back after NN reschedules, it may lead to 
get different results from deferent co-ordinators.
*NEED TO HANDLE*
# NN Restart- Should be same as point 1
# DN disconnect - here When DN disconnected simply and reconnected immediately 
(before heartbeat expiry), there should not any issues
*NEED NOT HANDLE*, but can think of more scenarios if any thing missing
# DN Restart- If DN restarted, DN can not send any results as it will loose 
everything. After NN SPS Monitor timeout, it will retry.
*NEED NOT HANDLE*, but can think of more scenarios if any thing missing

  was:
I am summarizing the scenarios here what Rakesh and me discussed offline:
Here we need to handle couple of cases:
# NN switch - it will freshly start scheduling for all files.
       At this time, old co-ordinators may continue movement work and send 
results back. This could confuse NN SPS that which result is right one.
  *NEED TO HANDLE*
# DN disconnected for heartbeat expiry - If DN disconnected for long time(more 
than heartbeat expiry), NN will remove this nodes. After SPS Monitor time out, 
it may retry for files which were scheduled to that DN. But if it reconnects 
back after NN reschedules, it may lead to get different results from deferent 
co-ordinators.
*NEED TO HANDLE*
# NN Restart- Should be same as point 1
# DN disconnect - here When DN disconnected simply and reconnected immediately 
(before heartbeat expiry), there should not any issues
*NEED NOT HANDLE*, but can think of more scenarios if any thing missing
# DN Restart- If DN restarted, DN can not send any results as it will loose 
everything. After NN SPS Monitor timeout, it will retry.
*NEED NOT HANDLE*, but can think of more scenarios if any thing missing


> Handle the case when NN switch and rescheduling movements can lead to have 
> more than one coordinator for same file block 
> -------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-11334
>                 URL: https://issues.apache.org/jira/browse/HDFS-11334
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: datanode, namenode
>    Affects Versions: HDFS-10285
>            Reporter: Uma Maheswara Rao G
>            Assignee: Rakesh R
>
> I am summarizing the scenarios here what Rakesh and me discussed offline:
> Here we need to handle couple of cases:
> # NN switch - it will freshly start scheduling for all files.
>        At this time, old co-ordinators may continue movement work and send 
> results back. This could confuse NN SPS that which result is right one.
>   *NEED TO HANDLE*
> # DN disconnected for heartbeat expiry - If DN disconnected for long 
> time(more than heartbeat expiry), NN will remove this nodes. After SPS 
> Monitor time out, it may retry for files which were scheduled to that DN, by 
> finding new co-ordinator. But if it reconnects back after NN reschedules, it 
> may lead to get different results from deferent co-ordinators.
> *NEED TO HANDLE*
> # NN Restart- Should be same as point 1
> # DN disconnect - here When DN disconnected simply and reconnected 
> immediately (before heartbeat expiry), there should not any issues
> *NEED NOT HANDLE*, but can think of more scenarios if any thing missing
> # DN Restart- If DN restarted, DN can not send any results as it will loose 
> everything. After NN SPS Monitor timeout, it will retry.
> *NEED NOT HANDLE*, but can think of more scenarios if any thing missing



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (HDFS-11334) Handle the case when NN switch and rescheduling movements can lead to have more than one coordinator for same file block

Reply via email to