[ 
https://issues.apache.org/jira/browse/HDDS-1317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16800159#comment-16800159
 ] 

Shashikant Banerjee commented on HDDS-1317:
-------------------------------------------

Thanks [~msingh] for the review.
{noformat}
1) handleFlush & handleFullBuffer share lines of code, I feel we can remove 
some redundant code.{noformat}
HandleFlush and handleFull Buffer do call waitOnFlushFutures and watchForCommit 
in  common. But the semantics are different. HandleFlush does a writeChunk 
followed by putBlock and waits for the future to complete and then call 
watchForCommit on the last Index. HandleFullBuffer condition doesn't do any 
writeChunk/PutBlock , but waits on the flushFutures if there are any pending 
and calls watchForCommit on the first index in the map. Since, they are doing 
different things , i feel its better to keep them separate instead as it will 
make the logic look more convoluted.

FYI, In a follow up jira, i will move the watchFoCommit related handling all in 
a separate class.
{noformat}
Also futureMap is really not needed, as watch for commit always waits for the 
data to be committed correctly. I feel we can abstract this future map inside 
handleFullBuffer.{noformat}
FutureMap is needed because, watchForCommit  is sync command. For example, 
let's putBlock call was executed but did not complete till my watchForCommit 
call gets called. As the putBlock was never complete, it won't add any 
logIndexes to the Map and hence watchForCommit will not do anything as there 
was no logIndex to watch for. Also, it provides the benefit to keep track of 
putBlock futures, so as inn case a future completes with exception, we will 
just throw the exception right away instead of detecting it later after 
watchForCommit.
{noformat}
3) KeyOutputStream:585, the remaining should be 0 here. Lets add this as a 
precondition and remove the if condition.{noformat}
"The remaining should be 0 here", is not necessarily true. Consider a case, 
where we are tryin to close a outputStream because its got full, IN this case, 
it will call close on the stream and if it hits an exception which needs to be 
handled, the data in the buffer at max half the block size will be written to 
the next block. Since, handleFlushorClose hit the exception, same function will 
get invoked now for the the next block. Since the next block is half written 
here but the original action was STREAM_FULL, we should verify whether the data 
remaining is really 0 before just closing it.

Rest if the review comments are addressed in patch v3.

> KeyOutputStream#write throws ArrayIndexOutOfBoundsException when running 
> RandomWrite MR examples
> ------------------------------------------------------------------------------------------------
>
>                 Key: HDDS-1317
>                 URL: https://issues.apache.org/jira/browse/HDDS-1317
>             Project: Hadoop Distributed Data Store
>          Issue Type: Sub-task
>    Affects Versions: 0.4.0
>            Reporter: Xiaoyu Yao
>            Assignee: Shashikant Banerjee
>            Priority: Major
>              Labels: pull-request-available
>         Attachments: HDDS-1317.000.patch, HDDS-1317.001.patch, 
> HDDS-1317.002.patch, HDDS-1317.003.patch
>
>          Time Spent: 50m
>  Remaining Estimate: 0h
>
> Repro steps:
> {code} 
> jar $HADOOP_MAPRED_HOME/hadoop-mapreduce-examples-*.jar randomwriter 
> -Dtest.randomwrite.total_bytes=10000000  o3fs://bucket1.vol1/randomwrite.out
> {code}
>  
> Error Stack:
> {code}
> 2019-03-20 19:02:37 INFO Job:1686 - Task Id : 
> attempt_1553108378906_0002_m_000000_0, Status : FAILED
> Error: java.lang.ArrayIndexOutOfBoundsException: -5
>  at java.util.ArrayList.elementData(ArrayList.java:422)
>  at java.util.ArrayList.get(ArrayList.java:435)
>  at 
> org.apache.hadoop.hdds.scm.storage.BufferPool.getBuffer(BufferPool.java:45)
>  at 
> org.apache.hadoop.hdds.scm.storage.BufferPool.allocateBufferIfNeeded(BufferPool.java:59)
>  at 
> org.apache.hadoop.hdds.scm.storage.BlockOutputStream.write(BlockOutputStream.java:215)
>  at 
> org.apache.hadoop.ozone.client.io.BlockOutputStreamEntry.write(BlockOutputStreamEntry.java:130)
>  at 
> org.apache.hadoop.ozone.client.io.KeyOutputStream.handleWrite(KeyOutputStream.java:311)
>  at 
> org.apache.hadoop.ozone.client.io.KeyOutputStream.write(KeyOutputStream.java:273)
>  at 
> org.apache.hadoop.fs.ozone.OzoneFSOutputStream.write(OzoneFSOutputStream.java:46)
>  at 
> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:57)
>  at java.io.DataOutputStream.write(DataOutputStream.java:107)
>  at org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:1444)
>  at 
> org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat$1.write(SequenceFileOutputFormat.java:83)
>  at 
> org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:670)
>  at 
> org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89)
>  at 
> org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112)
>  at 
> org.apache.hadoop.examples.RandomWriter$RandomMapper.map(RandomWriter.java:199)
>  at 
> org.apache.hadoop.examples.RandomWriter$RandomMapper.map(RandomWriter.java:165)
>  at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146)
>  at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:799)
>  at org.apache.hadoop.mapred.MapTask.run(MapTask.java:347)
>  at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:174)
>  at java.security.AccessController.doPrivileged(Native Method)
>  at javax.security.auth.Subject.doAs(Subject.java:422)
>  at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
>  at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:168)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to