[jira] [Comment Edited] (HDFS-12283) Ozone: DeleteKey-5: Implement SCM DeletedBlockLog

Weiwei Yang (JIRA) Mon, 14 Aug 2017 08:34:19 -0700

    [ 
https://issues.apache.org/jira/browse/HDFS-12283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16125837#comment-16125837
 ]


Weiwei Yang edited comment on HDFS-12283 at 8/14/17 3:33 PM:
-------------------------------------------------------------

Hi [~yuanbo]

Thanks for working on this and posting a patch, we are made good progress here. 
Despite the findbugs and checkstyle issues, I have some more comments

*Ozone.proto*

Can we move the protobuf definition code here to 
{{StorageContainerDatanodeProtocol.proto}}? Since this proto is serving the 
protocol between SCM and datanode.

*BlockDeletionTransaction.java*

line 25 - 30, suggestion to add some more explaination to the java doc, such 
as: a block deletion transaction represents a block deletion message send from 
SCM to datanode, which contains a number of under deletion blocks in a 
container and a txid as an unique ID to track the progress.

*DeletedBlockLog.java*

minor line 53: suggest to remove word DB from the java doc, as the interface 
might be implemented by some other form of logs in future

*DeletedBlockLogImpl.java*

line 56: I think the log impl should be self contained, we don't need to pass a 
{{MetadataStore}} argument in its constructor.  The constructor should handle 
following cases, 1) if db already exists, load the db using metadata store API; 
2) if db doesn't exists, create the DB on the given path (from configuration); 

line 70: this iterates the DB to get latest key, this is not efficiency I 
think. Maybe we can maintain latest txid in same database with a prefix? e.g 
#latest#latest_txid

line 110: this might have problem.. e.g you have listed tx 1,3,5 (2,4 has been 
committed), then 4 won't be a valid start key anymore.

line 141: should it be {{> MAX_RETRY}}? The init count is 0, means this 
transaction has not yet sent to datanode, if we want to support max time of 
retries, e.g 5, we need to revise this check to >5, correct?

line 189: this may introduce some race condition here. For example, if line 189 
succeed but line 193 failed, in this case DB entry not updated but txid 
incremented by 1, can you fix this inconsistency?

*TestDeletedBlockLog.java*

I think we need to add more test cases here. We need to at least address 
following

{{testGetTransactions}}: 1) can we test the case that to generate 40 tx, then 
fetch 30, 30, we need to ensure the 2nd time returns last 10 plus first 20; 2) 
can we add a test case that when the log is empty, we fetch an empty list?

{{testIncrementCount}}: we need to add the test to make sure it is in range 
[-1, MAX_RETRY], currently it only tests 0 to 1.

{{testCommitTransactions}}: can we add a test case to generate 40 tx, then 
commit random number of tx, then fetch 20, 20 to verify we can still get things 
we want?

We also need to add a brand new test case to simulate SCM restart, init 
DeletedBlockLog, add some updates, then restart it and make sure nothing lost.

Please let me know if this makes sense. Thank you.


was (Author: cheersyang):
Hi [~yuanbo]

Thanks for working on this and posting a patch, we are made good progress here. 
Despite the findbugs and checkstyle issues, I have some more comments

*Ozone.proto*

Can we move the protobuf definition code here to 
{{StorageContainerDatanodeProtocol.proto}}? Since this proto is serving the 
protocol between SCM and datanode.

*BlockDeletionTransaction.java*

line 25 - 30, suggestion to add some more explaination to the java doc, such 
as: a block deletion transaction represents a block deletion message send from 
SCM to datanode, which contains a number of under deletion blocks in a 
container and a txid as an unique ID to track the progress.

*DeletedBlockLog.java*

minor line 53: suggest to remove word DB from the java doc, as the interface 
might be implemented by some other form of logs in future

*DeletedBlockLogImpl.java*

line 56: I think the log impl should be self contained, we don't need to pass a 
{{MetadataStore}} argument in its constructor.  The constructor should handle 
following cases, 1) if db already exists, load the db using metadata store API; 
2) if db doesn't exists, create the DB on the given path (from configuration); 

line 70: this iterates the DB to get latest key, this is not efficiency I 
think. Maybe we can maintain latest txid in same database with a prefix? e.g 
#latest#latest_txid

line 110: this might have problem.. e.g you have listed tx 1,3,5 (2,4 has been 
committed), then 4 won't be a valid start key anymore.

line 141: should it be {{> MAX_RETRY}}? The init count is 0, means this 
transaction has not yet sent to datanode, if we want to support max time of 
retries, e.g 5, we need to revise this check to >5, correct?

line 189: this may introduce some race condition here. For example, if line 189 
succeed but line 193 failed, in this case DB entry not updated but txid 
incremented by 1, can you fix this inconsistency?

*TestDeletedBlockLog.java*

I think we need to add more test cases here. We need to at least address 
following

{{testGetTransactions}}: 1) can we test the case that to generate 40 tx, then 
fetch 30, 30, we need to ensure the 2nd time returns last 10 plus first 20; 2) 
can we add a test case that when the log is empty, we fetch an empty list?

{{testIncrementCount}}: we need to add the test to make sure it is in range 
[-1, MAX_RETRY], currently it only tests 0 to 1.

{{testCommitTransactions}}: can we add a test case to generate 40 tx, then 
commit random number of tx, then fetch 20, 20 to verify we can still get things 
we want?

Please let me know if this makes sense. Thank you.

> Ozone: DeleteKey-5: Implement SCM DeletedBlockLog
> -------------------------------------------------
>
>                 Key: HDFS-12283
>                 URL: https://issues.apache.org/jira/browse/HDFS-12283
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: ozone, scm
>            Reporter: Weiwei Yang
>            Assignee: Yuanbo Liu
>         Attachments: HDFS-12283.001.patch, HDFS-12283-HDFS-7240.001.patch
>
>
> The DeletedBlockLog is a persisted log in SCM to keep tracking container 
> blocks which are under deletion. It maintains info about under-deletion 
> container blocks that notified by KSM, and the state how it is processed. We 
> can use RocksDB to implement the 1st version of the log, the schema looks like
> ||TxID||ContainerName||Block List||ProcessedCount||
> |0|c1|b1,b2,b3|0|
> |1|c2|b1|3|
> |2|c2|b2, b3|-1|
> Some explanations
> # TxID is an incremental long value transaction ID for ONE container and 
> multiple blocks
> # Container name is the name of the container
> # Block list is a list of block IDs
> # ProcessedCount is the number of times SCM has sent this record to datanode, 
> it represents the "state" of the transaction, it is in range of \[-1, 5\], -1 
> means the transaction eventually failed after some retries, 5 is the max 
> number times of retries.
> We need to define {{DeletedBlockLog}} as an interface and implement this with 
> RocksDB {{MetadataStore}} as the first version.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (HDFS-12283) Ozone: DeleteKey-5: Implement SCM DeletedBlockLog

Reply via email to