[ 
https://issues.apache.org/jira/browse/HDFS-11920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16099438#comment-16099438
 ] 

Weiwei Yang commented on HDFS-11920:
------------------------------------

Hi [~vagarychen]

Thanks for the patch, it looks good to me overall. I have few comments please 
let me know if that makes sense to you,

1. *DistributedStorageHandler*

line 410: I am wondering why it is building the containerKey to 
"/volume/bucket/blockID", why not use simply {{BlockID}} here? This seems to be 
the key that written to container.db in container metadata.

2. *ChunkOutputStream*

I am thinking if we really need to let it know about an ozone object key, see 
line 56. Right now it writes a chunk file like 
{{ozoneKeyName_stream_streamId_chunk_n}}, why not  
{{blockId_stream_streamId_chunk_n}} instead? I think we can remove this 
variable from this class.

line 168: it writes {{b}} length to the outputstream but the position only 
moves 1, seems incorrect. 

3. *TestMultipleContainerReadWrite*

In {{TestWriteRead}}, can we check the number of chunk files for the key 
actually matches the desired number of split?

4. Looks like chunk group input or output stream maintains a list of streams 
and r/w in liner manner, can we optimize this to do parallel r/w as they are 
independent chunks. That says to have a thread fetch a certain length of 
content from a chunk, then merge them together afterwards. It doesn't have to 
be done in this patch, but I think that might be a good improvement.

Thanks

> Ozone : add key partition
> -------------------------
>
>                 Key: HDFS-11920
>                 URL: https://issues.apache.org/jira/browse/HDFS-11920
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>            Reporter: Chen Liang
>            Assignee: Chen Liang
>         Attachments: HDFS-11920-HDFS-7240.001.patch, 
> HDFS-11920-HDFS-7240.002.patch, HDFS-11920-HDFS-7240.003.patch, 
> HDFS-11920-HDFS-7240.004.patch
>
>
> Currently, each key corresponds to one single SCM block, and putKey/getKey 
> writes/reads to this single SCM block. This works fine for keys with 
> reasonably small data size. However if the data is too huge, (e.g. not even 
> fits into a single container), then we need to be able to partition the key 
> data into multiple blocks, each in one container. This JIRA changes the 
> key-related classes to support this.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to