[
https://issues.apache.org/jira/browse/HDDS-2330?focusedWorklogId=330438&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-330438
]
ASF GitHub Bot logged work on HDDS-2330:
----------------------------------------
Author: ASF GitHub Bot
Created on: 18/Oct/19 11:29
Start Date: 18/Oct/19 11:29
Worklog Time Spent: 10m
Work Description: adoroszlai commented on pull request #53: HDDS-2330.
Random key generator can get stuck
URL: https://github.com/apache/hadoop-ozone/pull/53
## What changes were proposed in this pull request?
Fix the problem that any exception/error not caught by `ObjectCreator` ends
the object creation task, but Freon's main thread continues waiting
indefinitely, since the exception is not stored.
https://issues.apache.org/jira/browse/HDDS-2330
## How was this patch tested?
Verified that OOME is caught, reported, and results in Freon exiting with
failure.
```
$ cd hadoop-ozone/dist/target/ozone-0.5.0-SNAPSHOT/compose/ozone
$ docker-compose up -d
$ docker-compose exec scm ozone freon rk --numOfThreads 1 --numOfVolumes 1
--numOfBuckets 1 --replicationType RATIS --factor ONE --keySize $(echo '2^20' |
bc -lq) --numOfKeys $(echo '5 * 2^10' | bc -lq) --bufferSize $(echo '2^16' | bc
-lq)
...
6.66% |???????
| 341/5120 Time: 0:00:17
[pool-2-thread-1] ERROR - Exception while adding key: key-357-74353 in
bucket: bucket-0-90611 of volume: vol-0-95721.
java.lang.OutOfMemoryError: Java heap space
at java.base/java.nio.HeapByteBuffer.<init>(HeapByteBuffer.java:61)
at java.base/java.nio.ByteBuffer.allocate(ByteBuffer.java:348)
at
org.apache.hadoop.hdds.scm.storage.BufferPool.allocateBufferIfNeeded(BufferPool.java:81)
at
org.apache.hadoop.hdds.scm.storage.BlockOutputStream.write(BlockOutputStream.java:233)
at
org.apache.hadoop.ozone.client.io.BlockOutputStreamEntry.write(BlockOutputStreamEntry.java:129)
at
org.apache.hadoop.ozone.client.io.KeyOutputStream.handleWrite(KeyOutputStream.java:208)
at
org.apache.hadoop.ozone.client.io.KeyOutputStream.write(KeyOutputStream.java:190)
at
org.apache.hadoop.ozone.client.io.OzoneOutputStream.write(OzoneOutputStream.java:49)
at
org.apache.hadoop.ozone.freon.RandomKeyGenerator.createKey(RandomKeyGenerator.java:710)
at
org.apache.hadoop.ozone.freon.RandomKeyGenerator.access$1100(RandomKeyGenerator.java:88)
at
org.apache.hadoop.ozone.freon.RandomKeyGenerator$ObjectCreator.run(RandomKeyGenerator.java:615)
at
java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:834)
100.00%
|?????????????????????????????????????????????????????????????????????????????????????????????????????|
5120/5120 Time: 0:00:20
java.lang.OutOfMemoryError: Java heap space
***************************************************
Status: Failed
Git Base Revision: e97acb3bd8f3befd27418996fa5d4b50bf2e17bf
Number of Volumes created: 1
Number of Buckets created: 1
Number of Keys added: 357
Ratis replication factor: ONE
Ratis replication type: RATIS
Average Time spent in volume creation: 00:00:00,190
Average Time spent in bucket creation: 00:00:00,030
Average Time spent in key creation: 00:00:02,826
Average Time spent in key write: 00:00:14,607
Total bytes written: 374341632
Total Execution time: 00:00:21,593
***************************************************
```
Also verified that successful execution is not affected:
```
$ docker-compose exec scm ozone freon rk --numOfThreads 1 --numOfVolumes 1
--numOfBuckets 1 --replicationType RATIS --factor ONE --keySize $(echo '2^20' |
bc -lq) --numOfKeys 3 --bufferSize $(echo '2^16' | bc -lq)
...
100.00%
|?????????????????????????????????????????????????????????????????????????????????????????????????????|
3/3 Time: 0:00:02
***************************************************
Status: Success
Git Base Revision: e97acb3bd8f3befd27418996fa5d4b50bf2e17bf
Number of Volumes created: 1
Number of Buckets created: 1
Number of Keys added: 3
Ratis replication factor: ONE
Ratis replication type: RATIS
Average Time spent in volume creation: 00:00:00,083
Average Time spent in bucket creation: 00:00:00,012
Average Time spent in key creation: 00:00:00,069
Average Time spent in key write: 00:00:01,611
Total bytes written: 3145728
Total Execution time: 00:00:05,986
***************************************************
```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 330438)
Remaining Estimate: 0h
Time Spent: 10m
> Random key generator can get stuck
> ----------------------------------
>
> Key: HDDS-2330
> URL: https://issues.apache.org/jira/browse/HDDS-2330
> Project: Hadoop Distributed Data Store
> Issue Type: Bug
> Components: freon
> Reporter: Attila Doroszlai
> Assignee: Attila Doroszlai
> Priority: Major
> Labels: pull-request-available
> Time Spent: 10m
> Remaining Estimate: 0h
>
> Freon's random key generator can get stuck waiting for completion (without
> any hint to what's happening) if object creation encounters any
> non-IOException.
> Steps to reproduce:
> # Start Ozone cluster with 1 datanode
> # Start Freon (5K keys of size 1MB)
> Result: after a few hundred keys progress stops.
> {noformat}
> $ docker-compose exec scm ozone freon rk --numOfThreads 1 --numOfVolumes 1
> --numOfBuckets 1 --replicationType RATIS --factor ONE --keySize $(echo '2^20'
> | bc -lq) --numOfKeys $(echo '5 * 2^10' | bc -lq) --bufferSize $(echo '2^16'
> | bc -lq)
> 2019-10-18 10:44:45,224 INFO impl.MetricsConfig: Loaded properties from
> hadoop-metrics2.properties
> 2019-10-18 10:44:45,381 INFO impl.MetricsSystemImpl: Scheduled Metric
> snapshot period at 10 second(s).
> 2019-10-18 10:44:45,381 INFO impl.MetricsSystemImpl: ozone-freon metrics
> system started
> 2019-10-18 10:44:47,140 [main] INFO - Number of Threads: 1
> 2019-10-18 10:44:47,145 [main] INFO - Number of Volumes: 1.
> 2019-10-18 10:44:47,146 [main] INFO - Number of Buckets per Volume: 1.
> 2019-10-18 10:44:47,146 [main] INFO - Number of Keys per Bucket: 5120.
> 2019-10-18 10:44:47,147 [main] INFO - Key size: 1048576 bytes
> 2019-10-18 10:44:47,147 [main] INFO - Buffer size: 65536 bytes
> 2019-10-18 10:44:47,147 [main] INFO - validateWrites : false
> 2019-10-18 10:44:47,151 [main] INFO - Starting progress bar Thread.
> ...
> 7.07% |????????
> | 362/5120
> {noformat}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]