[ 
https://issues.apache.org/jira/browse/HDFS-11920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16100567#comment-16100567
 ] 

Chen Liang commented on HDFS-11920:
-----------------------------------

Thanks [~cheersyang] for the review and the comments!
bq. why not use simply BlockID here
Thanks for the catch, will fix in next patch

bq. why not blockId_stream_streamId_chunk_n instead?
I didn't make any change to {{ChunkOutputStream}} in this JIRA and was not 
planning to touch it initially. But when I read this class again, I think what 
you suggested totally makes sense. Although I'm still a bit more inclined to 
keep key name as part of the chunk name for the time being, just for debugging 
purpose. I think there was at least once this allowed us to easily locate chunk 
files of a key when debugging. Let's revisit this later I think.

bq. it writes b length to the outputstream but the position only moves 1
did you mean {{public void write(int b)}}? It's not writing b bytes, but only, 
writing one byte which is the 8 lower bits of the integer b...it's very 
confusing indeed, and this is the outputstream spec 
[here|https://docs.oracle.com/javase/7/docs/api/java/io/OutputStream.html#write(int)].

bq. can we check the number of chunk files for the key
I manually checked a couple runs, seemed correct with 5 chunk files. But I did 
some checking, unfortunately there does not seem to be an easy way to directly 
check this programmatically. The closest thing I found is that the container 
metric numWriteChunk and numReadChunk should both be exactly 5. Will include 
this check in next patch.

bq. do parallel r/w as they are independent chunks
This is indeed part of the plan, we will definitely come back to this later. It 
by itself will be a complex change, especially when random access read/write 
and versioning considered.




> Ozone : add key partition
> -------------------------
>
>                 Key: HDFS-11920
>                 URL: https://issues.apache.org/jira/browse/HDFS-11920
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>            Reporter: Chen Liang
>            Assignee: Chen Liang
>         Attachments: HDFS-11920-HDFS-7240.001.patch, 
> HDFS-11920-HDFS-7240.002.patch, HDFS-11920-HDFS-7240.003.patch, 
> HDFS-11920-HDFS-7240.004.patch
>
>
> Currently, each key corresponds to one single SCM block, and putKey/getKey 
> writes/reads to this single SCM block. This works fine for keys with 
> reasonably small data size. However if the data is too huge, (e.g. not even 
> fits into a single container), then we need to be able to partition the key 
> data into multiple blocks, each in one container. This JIRA changes the 
> key-related classes to support this.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to