[jira] [Updated] (HDFS-11334) [SPS]: NN switch and rescheduling movements can lead to have more than one coordinator for same file blocks

Uma Maheswara Rao G (JIRA) Tue, 18 Apr 2017 15:29:04 -0700

     [ 
https://issues.apache.org/jira/browse/HDFS-11334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Uma Maheswara Rao G updated HDFS-11334:
---------------------------------------
      Resolution: Fixed
    Hadoop Flags: Reviewed
          Status: Resolved  (was: Patch Available)

I have just pushed this to branch

> [SPS]: NN switch and rescheduling movements can lead to have more than one 
> coordinator for same file blocks
> -----------------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-11334
>                 URL: https://issues.apache.org/jira/browse/HDFS-11334
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: datanode, namenode
>    Affects Versions: HDFS-10285
>            Reporter: Uma Maheswara Rao G
>            Assignee: Rakesh R
>             Fix For: HDFS-10285
>
>         Attachments: HDFS-11334-HDFS-10285-00.patch, 
> HDFS-11334-HDFS-10285-01.patch, HDFS-11334-HDFS-10285-02.patch, 
> HDFS-11334-HDFS-10285-03.patch, HDFS-11334-HDFS-10285-04.patch
>
>
> I am summarizing the scenarios here what Rakesh and me discussed offline:
> Here we need to handle couple of cases:
> # NN switch - it will freshly start scheduling for all files.
>        At this time, old co-ordinators may continue movement work and send 
> results back. This could confuse NN SPS that which result is right one.
>   *NEED TO HANDLE*
> # DN disconnected for heartbeat expiry - If DN disconnected for long 
> time(more than heartbeat expiry), NN will remove this nodes. After SPS 
> Monitor time out, it may retry for files which were scheduled to that DN, by 
> finding new co-ordinator. But if it reconnects back after NN reschedules, it 
> may lead to get different results from deferent co-ordinators.
> *NEED TO HANDLE*
> # NN Restart- Should be same as point 1
> # DN disconnect - here When DN disconnected simply and reconnected 
> immediately (before heartbeat expiry), there should not any issues
> *NEED NOT HANDLE*, but can think of more scenarios if any thing missing
> # DN Restart- If DN restarted, DN can not send any results as it will loose 
> everything. After NN SPS Monitor timeout, it will retry.
> *NEED NOT HANDLE*, but can think of more scenarios if any thing missing



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (HDFS-11334) [SPS]: NN switch and rescheduling movements can lead to have more than one coordinator for same file blocks

Reply via email to