[
https://issues.apache.org/jira/browse/HDDS-1317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16800159#comment-16800159
]
Shashikant Banerjee commented on HDDS-1317:
-------------------------------------------
Thanks [~msingh] for the review.
{noformat}
1) handleFlush & handleFullBuffer share lines of code, I feel we can remove
some redundant code.{noformat}
HandleFlush and handleFull Buffer do call waitOnFlushFutures and watchForCommit
in common. But the semantics are different. HandleFlush does a writeChunk
followed by putBlock and waits for the future to complete and then call
watchForCommit on the last Index. HandleFullBuffer condition doesn't do any
writeChunk/PutBlock , but waits on the flushFutures if there are any pending
and calls watchForCommit on the first index in the map. Since, they are doing
different things , i feel its better to keep them separate instead as it will
make the logic look more convoluted.
FYI, In a follow up jira, i will move the watchFoCommit related handling all in
a separate class.
{noformat}
Also futureMap is really not needed, as watch for commit always waits for the
data to be committed correctly. I feel we can abstract this future map inside
handleFullBuffer.{noformat}
FutureMap is needed because, watchForCommit is sync command. For example,
let's putBlock call was executed but did not complete till my watchForCommit
call gets called. As the putBlock was never complete, it won't add any
logIndexes to the Map and hence watchForCommit will not do anything as there
was no logIndex to watch for. Also, it provides the benefit to keep track of
putBlock futures, so as inn case a future completes with exception, we will
just throw the exception right away instead of detecting it later after
watchForCommit.
{noformat}
3) KeyOutputStream:585, the remaining should be 0 here. Lets add this as a
precondition and remove the if condition.{noformat}
"The remaining should be 0 here", is not necessarily true. Consider a case,
where we are tryin to close a outputStream because its got full, IN this case,
it will call close on the stream and if it hits an exception which needs to be
handled, the data in the buffer at max half the block size will be written to
the next block. Since, handleFlushorClose hit the exception, same function will
get invoked now for the the next block. Since the next block is half written
here but the original action was STREAM_FULL, we should verify whether the data
remaining is really 0 before just closing it.
Rest if the review comments are addressed in patch v3.
> KeyOutputStream#write throws ArrayIndexOutOfBoundsException when running
> RandomWrite MR examples
> ------------------------------------------------------------------------------------------------
>
> Key: HDDS-1317
> URL: https://issues.apache.org/jira/browse/HDDS-1317
> Project: Hadoop Distributed Data Store
> Issue Type: Sub-task
> Affects Versions: 0.4.0
> Reporter: Xiaoyu Yao
> Assignee: Shashikant Banerjee
> Priority: Major
> Labels: pull-request-available
> Attachments: HDDS-1317.000.patch, HDDS-1317.001.patch,
> HDDS-1317.002.patch, HDDS-1317.003.patch
>
> Time Spent: 50m
> Remaining Estimate: 0h
>
> Repro steps:
> {code}
> jar $HADOOP_MAPRED_HOME/hadoop-mapreduce-examples-*.jar randomwriter
> -Dtest.randomwrite.total_bytes=10000000 o3fs://bucket1.vol1/randomwrite.out
> {code}
>
> Error Stack:
> {code}
> 2019-03-20 19:02:37 INFO Job:1686 - Task Id :
> attempt_1553108378906_0002_m_000000_0, Status : FAILED
> Error: java.lang.ArrayIndexOutOfBoundsException: -5
> at java.util.ArrayList.elementData(ArrayList.java:422)
> at java.util.ArrayList.get(ArrayList.java:435)
> at
> org.apache.hadoop.hdds.scm.storage.BufferPool.getBuffer(BufferPool.java:45)
> at
> org.apache.hadoop.hdds.scm.storage.BufferPool.allocateBufferIfNeeded(BufferPool.java:59)
> at
> org.apache.hadoop.hdds.scm.storage.BlockOutputStream.write(BlockOutputStream.java:215)
> at
> org.apache.hadoop.ozone.client.io.BlockOutputStreamEntry.write(BlockOutputStreamEntry.java:130)
> at
> org.apache.hadoop.ozone.client.io.KeyOutputStream.handleWrite(KeyOutputStream.java:311)
> at
> org.apache.hadoop.ozone.client.io.KeyOutputStream.write(KeyOutputStream.java:273)
> at
> org.apache.hadoop.fs.ozone.OzoneFSOutputStream.write(OzoneFSOutputStream.java:46)
> at
> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:57)
> at java.io.DataOutputStream.write(DataOutputStream.java:107)
> at org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:1444)
> at
> org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat$1.write(SequenceFileOutputFormat.java:83)
> at
> org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:670)
> at
> org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89)
> at
> org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112)
> at
> org.apache.hadoop.examples.RandomWriter$RandomMapper.map(RandomWriter.java:199)
> at
> org.apache.hadoop.examples.RandomWriter$RandomMapper.map(RandomWriter.java:165)
> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146)
> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:799)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:347)
> at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:174)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:168)
> {code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]